Skip to main content

Natural Language Processing - II - Sentiment Analysis

 Fundamental Tasks of NLP: 

Sentiment Analysis:

Sentiment analysis is a natural language processing (NLP) technique used to determine the sentiment expressed in a piece of text. The goal of sentiment analysis is to automatically extract and quantify subjective information from text data such as opinions, attitudes, emotions and feelings.

Let's say we have a dataset containing customer reviews of a product, and we want to analyze the sentiment expressed in each review. The sentiment could be positive, negative or neutral.

For example, given the review "I absolutely love this product! It's amazing!", the sentiment analysis model might classify it as positive.

Similarly, for the review "This product is terrible. I would not recommend it to anyone.", the model might classify it as negative.

And for the review "The product arrived on time, but it was not what I expected.", the model might classify it as neutral.

Steps of Sentiment Analysis:  

Here's how sentiment analysis typically works:

  1. Text Input: The input to sentiment analysis is a piece of text which could be a sentence, paragraph, document or even a social media post.

  2. Preprocessing: The text is preprocessed to remove any noise or irrelevant information, such as punctuation, special characters and stopwords (common words like "and", "the", "is" that do not carry much meaning).

  3. Feature Extraction: Next, features are extracted from the preprocessed text. Common techniques include bag-of-words, TF-IDF (Term Frequency-Inverse Document Frequency), word embeddings (e.g., Word2Vec, GloVe) or contextual embeddings (e.g., BERT, GPT).

  4. Sentiment Classification: Once the features are extracted a machine learning model or a pre-trained deep learning model is used to classify the sentiment of the text into predefined categories, such as positive, negative or neutral. Some models may provide more granular sentiment analysis, such as sentiment scores ranging from strongly negative to strongly positive.

  5. Evaluation and Optimization: The performance of the sentiment analysis model is evaluated using metrics such as accuracy, precision, recall, F1-score or Mean Squared Error (MSE), depending on the task. The model may be fine-tuned and optimized using techniques such as hyperparameter tuning or cross-validation to improve its performance.

 

Example: 


 

Movie reviews help users decide whether a movie is worth watching or not. A summary of the reviews for a movie can help a user make quick decisions within a small period of time, rather than spending much more time reading multiple reviews for a movie. Sentiment analysis helps in rating how positive or negative a movie review is. Therefore, the process of understanding if a review is positive or negative can be automated as the machine learns different techniques from the domain of Natural Language Processing.

The dataset contains 10,000 movie reviews. The objective is to do Sentiment Analysis(positive/negative) for the movie reviews using different techniques like supervised and unsupervised learning methods and compare which gives the better and most accurate results.

  1. Supervised models - Some popular techniques used for encoding text:
    •       **Bag of Words**
      
    •       **TF-IDF** (**T**erm  **F**requency - **I**nverse **D**ocument **F**requency)
      
  2. Unsupervised models - Some popular techniques used for unsupervised Sentiment Analysis:
    •       **TextBlob**         
      
    •       **VADER Sentiment**
      Data Dictionary:  
    • review: reviews of the movies.
    • sentiment: indicates the sentiment of the review 0 or 1( 0 is for negative review and 1 for positive review)

    Dataset source:

  3. IMDB Movie Ratings Sentiment Analysis: https://www.kaggle.com/datasets/yasserh/imdb-movie-ratings-sentiment-analysis

 Sample reviews:


 
 

 Here, a sentiment value of 0 is negative, and 1 represents a positive sentiment.

A sample wordcloud after segregating negative and positive sentiment: 


 
The even, bad, never, little, least, maybe, instead, waste, terrible, still, boring were some of the important recurring words observed in the negative reviews.

Wordcloud for positive sentiment: 


 
well, good, best, great, enjoy, interesting, wonderful, much, fun, beautiful, fun were some of the important words observed in the positive reviews.

After constructing a model based on above dataset using one of the Supervised Learning algorithm namely Bag of Words (BoW) or CountVectorizer, an accuracy score of 82% can be obtained. 

A wordcloud based on top 40 features from the CountVectorizer model: 

Based on this model, it has been observed that the movie received mixed response with probably an upperhand in negative sentiments.

As the best scoring model is Bag of Words (CountVectorizer), we have opted to build the model features based on this supervised algorithm, however some of the other models tested include TF-IDF and unsupervised learning techniques, namely TextBlob and Vader.



 

 

Popular posts from this blog

Case Study: Reported Rape Cases Analysis

Case Study  : Rape Cases Analysis Country : India Samples used are the reports of rape cases from 2016 to 2021 in Indian states and Union Territories Abstract : Analyzing rape cases reported in India is crucial for understanding patterns, identifying systemic failures and driving policy reforms to ensure justice and safety. With high underreporting and societal stigma, data-driven insights can help reveal gaps in law enforcement, judicial processes and victim support systems. Examining factors such as regional trends, conviction rates and yearly variations aids in developing more effective legal frameworks and prevention strategies. Furthermore, such analysis raises awareness, encourages institutional accountability and empowers advocacy efforts aimed at addressing gender-based violence. A comprehensive approach to studying these cases is essential to creating a safer, legally sound and legitimate society. This study is being carried out with an objective to perform descriptive a...

Trials vs. Internet Vigilantism : Authoritative View

  1. In an era of internet vigilantism, would there be any impact on a fair trial due to interference of social media and public platforms ?  Ans. It depends on many factors. Social media can create public opinion based on half truths or misinformation, which can pressurize a judge to interpret evidence especially in a 50-50% chance case, in tune with the public opinion. A wavering judge may align his/her decision in favor of public opinion, lest he/she should be adversely criticized. But a trained judicial mind will not be influenced by external factors, but will be guided by the proof appearing from the evidence adduced in the case under trial. He/she will not succumb to the pressure exerted by social media. Similar is the case of prosecutors and investigators. Social media can easily affect a layman witness. It can affect the privacy of vulnerable victims also. Thus trial by media is a social evil. 2. With the rise of digital tools, how has the use of technology like digit...

Natural Language Processing - I

    Natural Language Processing is a subfield of AI that focuses on the interaction between computers and human languages. The primary goal of NLP is to enable machines to understand, interpret, and generate human language in a way that is both meaningful and valuable. NLP in AI involves the development of algorithms and models that allow computers to process and analyze natural language data. This includes tasks such as text parsing, sentiment analysis, language translation and speech recognition. NLP applications can be found in various domains, including virtual assistants, chatbots, language translation services and sentiment analysis tools.  Tasks of NLP :   Text Classification: Sentiment Analysis: Determining the sentiment expressed in a piece of text (positive, negative, neutral). Topic Classification: Categorizing a document or piece of text into predefined topics or categories. Named Entity Recognition (NER): Identifying and classifying entiti...