Last updated February 2, 2021
In AI Mysteries

Visualizing Sentiment Analysis Reports Using Scattertext NLP Tool

Scattertext is an open-source python library which is used with the help of spacy to create beautiful visualizations of what words and phrases are more characteristics of a given category.

Share

Published on September 6, 2020

by Himanshu Sharma

Natural Language Processing allows the computer to understand the human language with the help of different modules/packages that python provides. NLP can practically be used for Speech Recognition, creating voice search engines, etc. NLP can be used to perform a large variety of operations on text data like tokenizing, lamenting, stemming POS tagging, etc.

Spacy is an NLP based python library that performs different NLP operations. Some of its main features are NER, POS tagging, dependency parsing, word vectors. Also, it contains models of different languages that can be used accordingly.

Scattertext is an open-source python library that is used with the help of spacy to create beautiful visualizations of what words and phrases are more characteristics of a given category. It is a tool for finding distinguishing terms in corpora and presenting them in an interactive, HTML scatter plot. Scattertext visualizations are highly informative because in the visualization the points corresponding to terms are selectively labeled so that they don’t overlap with other labels or points.

In this article, we will draw a sentiment analysis visualization using spacy and scatter text and see how beautifully scatter text allows you to visualize and find text in the data.

Implementation:

We will start by installing spacy and scattertext using pip install spacy and pip install scattertext respectively.

Importing required libraries

We will be importing spacy and scattertext for visualization and pandas for loading our dataset.

import spacy

import pandas as pd

import scattertext as st

Loading the Dataset

For creating a sentiment analysis visualization we will import ‘Twitter Airline Sentiment Dataset’ from Kaggle. The dataset contains different attributes like Username, tweet, id, text, etc. We will use the data to visualize the different terms used for different sentiments.

twitter_df = pd.read_csv('Tweets.csv')

twitter_df.dtypes

Downloading English Model

As we have already discussed, spacy contains models for different languages. We will use spacy and download the English model as we are working in the English Language.

nlp = spacy.load('en')

Creating Scatterext Corpus

Next, we will create a scattertext corpus of the dataset we are working on As we are working on the sentiment analysis we will set the category_col to ‘airline_sentiment’, and the text column which contains tweets will be used as text_col.

corpus = st.CorpusFromPandas(twitter_df, category_col='airline_sentiment', text_col='text', nlp=nlp).build()

For creating this corpus we have used the NLP as the English model which we downloaded in the previous step, and create it using the build() function.

Creating the visualization

This is the main and the final step. Here we will create a visualization with the following parameters:

category: We will set this to negative as we will denote negative sentiments using this.
category_name: This will be set as “Negative” and displayed as the axis title
not_category_name: The sentiments which are not in the negative category are under this category with the name as “Positive”.
Metadata: The data we will be using for excerpts.

Now let us define all these and create the visualization using produce_scattertext_explorer.

sent = st.produce_scattertext_explorer(corpus,

category='negative',

category_name='Negative',

not_category_name='Positive',

width_in_pixels=1000,

metadata=netflix_df['name'])

This command will create the desired visualization and we will write this into an Html file that can be run standalone.

open(“Twitter_Sentiment.html", 'wb').write(html.encode('utf-8'))

This is the final visualization we created using scattertext.

In the visualization, we can clearly see that X-Axis displays the positive frequency and the y-axis displays the negative frequency. The axis is divided into three sections namely:

Frequent: It shows the words with the highest frequency
Average: Shows word with an average frequency
Infrequent: Shows words with the least frequency.

We can also see that the visualization contains the ‘Top Negative Words’, ‘Top Positive Words’, and the ‘Characteristics’ also. Other than this we can see that there is a search bar that is used to search a word in the corpus and display its frequency along with the text where it is used.

Let us search the word ‘hour’ and see the results.

Here we can see it clearly that the search results display the frequency of the word in the negative and the positive texts along with some of the tweets where this word is used.

The visualization created is highly interactive i.e. when you hover over any word in the visualization it displays its frequency along with score as a tooltip, and no word overlaps any other word.

Conclusion:

In this article we saw how beautiful, insightful and informative graphs/visualization can be created using scatter text. We saw how we can use this visualizations search bar to know the word frequency and where it is used. Scattertext is easy to use and is blazingly fast we can use it for different types of text data visualization.

Access all our open Survey & Awards Nomination forms in one place

Himanshu Sharma

An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.