MITB Banner

Visualizing Sentiment Analysis Reports Using Scattertext NLP Tool

Scattertext is an open-source python library which is used with the help of spacy to create beautiful visualizations of what words and phrases are more characteristics of a given category.

Share

Sentiment Analysis Scattertext

Natural Language Processing allows the computer to understand the human language with the help of different modules/packages that python provides. NLP can practically be used for Speech Recognition, creating voice search engines, etc. NLP can be used to perform a large variety of operations on text data like tokenizing, lamenting, stemming POS tagging, etc.

Spacy is an NLP based python library that performs different NLP operations. Some of its main features are NER, POS tagging, dependency parsing, word vectors. Also, it contains models of different languages that can be used accordingly.

Scattertext is an open-source python library that is used with the help of spacy to create beautiful visualizations of what words and phrases are more characteristics of a given category. It is a tool for finding distinguishing terms in corpora and presenting them in an interactive, HTML scatter plot. Scattertext visualizations are highly informative because in the visualization the points corresponding to terms are selectively labeled so that they don’t overlap with other labels or points.

In this article, we will draw a sentiment analysis visualization using spacy and scatter text and see how beautifully scatter text allows you to visualize and find text in the data.

Implementation:

We will start by installing spacy and scattertext using pip install spacy and pip install scattertext respectively.

  1. Importing required libraries

We will be importing spacy and scattertext for visualization and pandas for loading our dataset.

import spacy

import pandas as pd

import scattertext as st

  1. Loading the Dataset

For creating a sentiment analysis visualization we will import ‘Twitter Airline Sentiment Dataset’ from Kaggle. The dataset contains different attributes like Username, tweet, id, text, etc. We will use the data to visualize the different terms used for different sentiments.

twitter_df = pd.read_csv('Tweets.csv')

twitter_df.dtypes

Data Types of the Dataframe
  1. Downloading English Model

As we have already discussed, spacy contains models for different languages. We will use spacy and download the English model as we are working in the English Language.

nlp = spacy.load('en')

  1. Creating Scatterext Corpus

Next, we will create a scattertext corpus of the dataset we are working on As we are working on the sentiment analysis we will set the category_col to ‘airline_sentiment’, and the text column which contains tweets will be used as text_col.

corpus = st.CorpusFromPandas(twitter_df, category_col='airline_sentiment',                         text_col='text',  nlp=nlp).build()

For creating this corpus we have used the NLP as the English model which we downloaded in the previous step, and create it using the build() function.

  1. Creating the visualization

This is the main and the final step. Here we will create a visualization with the following parameters:

  • category: We will set this to negative as we will denote negative sentiments using this.
  • category_name: This will be set as “Negative” and displayed as the axis title
  • not_category_name: The sentiments which are not in the negative category are under this category with the name as “Positive”.
  • Metadata: The data we will be using for excerpts.

Now let us define all these and create the visualization using produce_scattertext_explorer.

sent = st.produce_scattertext_explorer(corpus,

        category='negative',

        category_name='Negative',

        not_category_name='Positive',

        width_in_pixels=1000,

        metadata=netflix_df['name'])

This command will create the desired visualization and we will write this into an Html file that can be run standalone.

open(“Twitter_Sentiment.html", 'wb').write(html.encode('utf-8'))

Visualization of Sentiments

This is the final visualization we created using scattertext.

In the visualization, we can clearly see that X-Axis displays the positive frequency and the y-axis displays the negative frequency. The axis is divided into three sections namely:

  • Frequent: It shows the words with the highest frequency
  • Average: Shows word with an average frequency
  • Infrequent: Shows words with the least frequency.

We can also see that the visualization contains the ‘Top Negative Words’, ‘Top Positive Words’, and the ‘Characteristics’ also. Other than this we can see that there is a search bar that is used to search a word in the corpus and display its frequency along with the text where it is used.

Let us search the word ‘hour’ and see the results.

Word Search

Here we can see it clearly that the search results display the frequency of the word in the negative and the positive texts along with some of the tweets where this word is used. 

The visualization created is highly interactive i.e. when you hover over any word in the visualization it displays its frequency along with score as a tooltip, and no word overlaps any other word. 

Conclusion:

In this article we saw how beautiful, insightful and informative graphs/visualization can be created using scatter text. We saw how we can use this visualizations search bar to know the word frequency and where it is used. Scattertext is easy to use and is blazingly fast we can use it for different types of text data visualization.

Share
Picture of Himanshu Sharma

Himanshu Sharma

An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.