Natural Language Processing is a subfield of Artificial Intelligence that works on making the human language understandable to the machine/computer. NLP has different functionalities that work on the textual data and find out useful insights and information. NLP can practically be used for Speech Recognition, creating voice search engines, etc. NLP can be used to perform a large variety of operations on text data like tokenizing, lamenting, stemming POS tagging, etc.
NLP Profiler is a simple NLP library which works on profiling of textual datasets with one one more text columns. Basically NLP profilers provide us with high-level insights about the data along with the statistical properties of the data. It works the same way as pandas.describe() works for pandas dataframe for statistical properties.
It takes the textual data as input with at least one column with text data and returns a dataframe which contains useful insights about the data like sentiment analysis, the subjectivity of data, etc. NLP profiler is in its early stage and is continuously improving.
In this article, we will explore what are the different functionalities that are there in NLP profiler and implement them in order to gain useful insights from the data.
Implementation:
NLP Profiler can be installed using the git repository where it is hosted. Before Installing it you need to download and install the git version according to your operating system. After git is installed we can install NLP Profiler by running the below-given command in the command prompt.
pip install git+https://github.com/neomatrix369/nlp_profiler.git@master
- Importing required libraries
We will load the data using pandas so we will import pandas and for creating the data profile we will import the NLP profiler.
import pandas as pd
from nlp_profiler.core import apply_text_profiling
- Loading the dataset
We need a textual dataset in order to explore NLP profiler, here I have used a dataset containing tweets which can be downloaded from Kaggle. This dataset contains different attributes like tweets, usernames, etc. But are only concerned with the text i.e. the tweets, so we will load this dataset and slice this dataset to make a new dataframe that contains only text column.
This dataset is pretty large so I have taken only the first 100 rows otherwise it will take a lot of time in computation.
df = pd.read_csv('Tweets.csv')
text_nlp = pd.DataFrame(df, columns=['text'])
text_nlp.head()
- Applying Text Profiling
Next, we will pass this data to the text profiling function where we need to mention the name of the dataframe and the column which contains text so that it can create a new dataframe that contains the Text Profile with different attributes.
profile_data = apply_text_profiling(text_nlp, 'text')
profile_data.head()
Here you can see how NLP Profiler has created a new dataframe that contains 22 attributes about the text like polarity, sentiment, subjectivity, etc. This is a great way of analyzing different text data and gain useful insights.
We can also use the describe function to analyze the statistical properties of these attributes.
profile_data.describe()
Here we can different statistical properties of the textual dataset according to different attributes/properties of the data.
Next, let us visualize some of the attributes which are created by NLP Profiler.
- Visualizing Attributes
Let us visualize some of the attributes created by NLP Profiler in order to get some meaningful insights and patterns.
- Sentiment Polarity and Sentiment Polarity Score
Sentiment polarity and polarity score will tell us how text data is divided for different sentiments like positive, negative, etc.
profile_data[‘'sentiment_polarity_score'].hist()
profile_data['sentiment_polarity'].hist()
- Sentiment Subjectivity
Subjectivity is used to analyze whether the text is subjective or objective.
profiled_text_dataframe['sentiment_subjectivity_summarised'].hist()
- Spelling Quality
Spelling quality checks the spelling of the words that are used in the text and differentiate them accordingly.
profiled_text_dataframe['spelling_quality'].hist()
- Emoticons in Text
It counts the emojis that are used while writing the text.
profiled_text_dataframe['emoji_count'].hist()
Similarly, we can plot and analyze different attributes which are created by NLP Profiler.
Conclusion:
In this article, we started with loading a dataset containing tweets and passing the dataset to the NLP profiler in order to get a new dataframe on Text Profile. We saw how we can analyze the statistical properties of this dataset and finally, we created some visualizations which give us useful insights from NLP perspective. NLP Profiler is easy to use as it creates the text profile in just one line of code with insightful attributes.