Skip to main content

Natural Language Processing - II - Sentiment Analysis

 Fundamental Tasks of NLP: 

Sentiment Analysis:

Sentiment analysis is a natural language processing (NLP) technique used to determine the sentiment expressed in a piece of text. The goal of sentiment analysis is to automatically extract and quantify subjective information from text data such as opinions, attitudes, emotions and feelings.

Let's say we have a dataset containing customer reviews of a product, and we want to analyze the sentiment expressed in each review. The sentiment could be positive, negative or neutral.

For example, given the review "I absolutely love this product! It's amazing!", the sentiment analysis model might classify it as positive.

Similarly, for the review "This product is terrible. I would not recommend it to anyone.", the model might classify it as negative.

And for the review "The product arrived on time, but it was not what I expected.", the model might classify it as neutral.

Steps of Sentiment Analysis:  

Here's how sentiment analysis typically works:

  1. Text Input: The input to sentiment analysis is a piece of text which could be a sentence, paragraph, document or even a social media post.

  2. Preprocessing: The text is preprocessed to remove any noise or irrelevant information, such as punctuation, special characters and stopwords (common words like "and", "the", "is" that do not carry much meaning).

  3. Feature Extraction: Next, features are extracted from the preprocessed text. Common techniques include bag-of-words, TF-IDF (Term Frequency-Inverse Document Frequency), word embeddings (e.g., Word2Vec, GloVe) or contextual embeddings (e.g., BERT, GPT).

  4. Sentiment Classification: Once the features are extracted a machine learning model or a pre-trained deep learning model is used to classify the sentiment of the text into predefined categories, such as positive, negative or neutral. Some models may provide more granular sentiment analysis, such as sentiment scores ranging from strongly negative to strongly positive.

  5. Evaluation and Optimization: The performance of the sentiment analysis model is evaluated using metrics such as accuracy, precision, recall, F1-score or Mean Squared Error (MSE), depending on the task. The model may be fine-tuned and optimized using techniques such as hyperparameter tuning or cross-validation to improve its performance.

 

Example: 


 

Movie reviews help users decide whether a movie is worth watching or not. A summary of the reviews for a movie can help a user make quick decisions within a small period of time, rather than spending much more time reading multiple reviews for a movie. Sentiment analysis helps in rating how positive or negative a movie review is. Therefore, the process of understanding if a review is positive or negative can be automated as the machine learns different techniques from the domain of Natural Language Processing.

The dataset contains 10,000 movie reviews. The objective is to do Sentiment Analysis(positive/negative) for the movie reviews using different techniques like supervised and unsupervised learning methods and compare which gives the better and most accurate results.

  1. Supervised models - Some popular techniques used for encoding text:
    •       **Bag of Words**
      
    •       **TF-IDF** (**T**erm  **F**requency - **I**nverse **D**ocument **F**requency)
      
  2. Unsupervised models - Some popular techniques used for unsupervised Sentiment Analysis:
    •       **TextBlob**         
      
    •       **VADER Sentiment**
      Data Dictionary:  
    • review: reviews of the movies.
    • sentiment: indicates the sentiment of the review 0 or 1( 0 is for negative review and 1 for positive review)

    Dataset source:

  3. IMDB Movie Ratings Sentiment Analysis: https://www.kaggle.com/datasets/yasserh/imdb-movie-ratings-sentiment-analysis

 Sample reviews:


 
 

 Here, a sentiment value of 0 is negative, and 1 represents a positive sentiment.

A sample wordcloud after segregating negative and positive sentiment: 


 
The even, bad, never, little, least, maybe, instead, waste, terrible, still, boring were some of the important recurring words observed in the negative reviews.

Wordcloud for positive sentiment: 


 
well, good, best, great, enjoy, interesting, wonderful, much, fun, beautiful, fun were some of the important words observed in the positive reviews.

After constructing a model based on above dataset using one of the Supervised Learning algorithm namely Bag of Words (BoW) or CountVectorizer, an accuracy score of 82% can be obtained. 

A wordcloud based on top 40 features from the CountVectorizer model: 

Based on this model, it has been observed that the movie received mixed response with probably an upperhand in negative sentiments.

As the best scoring model is Bag of Words (CountVectorizer), we have opted to build the model features based on this supervised algorithm, however some of the other models tested include TF-IDF and unsupervised learning techniques, namely TextBlob and Vader.



 

 

Popular posts from this blog

Case Study: Reported Rape Cases Analysis

Case Study  : Rape Cases Analysis Country : India Samples used are the reports of rape cases from 2016 to 2021 in Indian states and Union Territories Abstract : Analyzing rape cases reported in India is crucial for understanding patterns, identifying systemic failures and driving policy reforms to ensure justice and safety. With high underreporting and societal stigma, data-driven insights can help reveal gaps in law enforcement, judicial processes and victim support systems. Examining factors such as regional trends, conviction rates and yearly variations aids in developing more effective legal frameworks and prevention strategies. Furthermore, such analysis raises awareness, encourages institutional accountability and empowers advocacy efforts aimed at addressing gender-based violence. A comprehensive approach to studying these cases is essential to creating a safer, legally sound and legitimate society. This study is being carried out with an objective to perform descriptive a...

Artificial intelligence on Cloud

  Cloud computing is a technology model that enables convenient, on-demand access to a shared pool of computing resources (such as servers, storage, networking, databases, applications, and services) over the internet. Instead of owning and maintaining physical hardware and infrastructure, users can access and use computing resources on a pay-as-you-go basis, similar to a utility service.  Cloud computing also has deployment models, indicating how cloud services are hosted and made available to users: Public Cloud: Services are provided over the public internet and are available to anyone who wants to use or purchase them. Examples include AWS, Azure, and Google Cloud. Private Cloud: Cloud resources are used exclusively by a single organization. Private clouds can be hosted on-premises or by a third-party provider. Hybrid Cloud: Combines elements of both public and private clouds. It allows data and applications to be shared between them, offering greater flexibility a...

Everything/Anything as a Service (XaaS)

  "Anything as a Service" or "Everything as a Service."     XaaS, or "Anything as a Service," represents the comprehensive and evolving suite of services and applications delivered to users via the internet. This paradigm encompasses a wide array of cloud-based solutions, transcending traditional boundaries to include software, infrastructure, platforms and more. There are numerous types of XaaS: Software as a service Platform as a service Infrastructure as a service Storage as a service Mobility as a service Database as a service Communications as a service Network as a service  .. and this list goes on by each passing day  Most familiar and known services in Cloud Computing : Software as a service ...