Text-Mining Demo

Oct 26, 2025 · 2 min read

Summary

This demo explores the use of _Hugging Face zero-shot text classification models_ to analyze multilingual user comments (e.g., YouTube). Two classification dimensions are focused on:
  • Sentiment: whether the comment is positive, neutral, or negative
  • Category: what the comment refers to (functionality, UI, or ads)

Selected Models

Three pre-trained NLI-based transformer models are used for comparison:

  • Model A: facebook/bart-large-mnli
  • Model B: joeddav/xlm-roberta-large-xnli
  • Model C: typeform/distilbert-base-uncased-mnli

Classification

Following are 10 manually labeled comments, spanning multiple languages, sentiment types, and categories. Each model classifies each comment for both sentiment and category, correct results are shown in bold in the table below.

CommentSentimentCategory
ABCABC
The video loads fast and never lags. Love it!positivepositivepositivefunctionalityfunctionalityfunctionality
The search function works fine, but nothing special.neutralpositivepositivefunctionalityfunctionalityuser interface
I can’t rewind properly anymore, super annoying.negativenegativenegativefunctionalityfunctionalityfunctionality
I like how the app looks now, clean and smooth.positivepositivepositiveuser interfaceuser interfacefunctionality
I am ok with the UI, it can be better though.positivepositivepositiveuser interfaceuser interfacefunctionality
The layout is messy after the latest update.negativenegativenegativeuser interfaceuser interfacefunctionality
Way too many ads these days, it ruins the experience.negativenegativenegativeadsadsads
Ads are skippable now, so it’s not too bad.positivepositivepositiveadsfunctionalityads
DE: Zu viele nervige Werbungen, es macht keinen Spaß mehr.negativenegativepositiveadsadsfunctionality
CN: 功能正常,但感觉有点卡。positivepositivepositiveuser interfacefunctionalityuser interface

Performance

Classification performance was compared across three models, Model A (facebook/bart-large-mnli) achieved higher scores in both the Sentiment and Category dimensions.

Next Steps

App Store and Google Play both provide public access to app reviews. By identifying the corresponding app ID (for example, YouTube’s App Store ID is 544007664), these review data can be extracted automatically. Combined with the comment classification method demonstrated above, this enables real-time categorization of user feedback and helps track emerging user needs and pain points.