Now Reading
Complete Guide On NLP Profiler: Python Tool For Profiling of Textual Dataset

Complete Guide On NLP Profiler: Python Tool For Profiling of Textual Dataset

Himanshu Sharma
NLP Profiler Banner
W3Schools

Natural Language Processing is a subfield of Artificial Intelligence that works on making the human language understandable to the machine/computer. NLP has different functionalities that work on the textual data and find out useful insights and information. NLP can practically be used for Speech Recognition, creating voice search engines, etc. NLP can be used to perform a large variety of operations on text data like tokenizing, lamenting, stemming POS tagging, etc.

NLP Profiler is a simple NLP library which works on profiling of textual datasets with one one more text columns. Basically NLP profilers provide us with high-level insights about the data along with the statistical properties of the data. It works the same way as pandas.describe() works for pandas dataframe for statistical properties.

It takes the textual data as input with at least one column with text data and returns a dataframe which contains useful insights about the data like sentiment analysis, the subjectivity of data, etc. NLP profiler is in its early stage and is continuously improving. 



In this article, we will explore what are the different functionalities that are there in NLP profiler and implement them in order to gain useful insights from the data.

Implementation:

NLP Profiler can be installed using the git repository where it is hosted. Before Installing it you need to download and install the git version according to your operating system. After git is installed we can install NLP Profiler by running the below-given command in the command prompt.

pip install git+https://github.com/neomatrix369/nlp_profiler.git@master

  1. Importing required libraries

We will load the data using pandas so we will import pandas and for creating the data profile we will import the NLP profiler.

import pandas as pd

from nlp_profiler.core import apply_text_profiling

  1. Loading the dataset

We need a textual dataset in order to explore NLP profiler, here I have used a dataset containing tweets which can be downloaded from Kaggle. This dataset contains different attributes like tweets, usernames, etc. But are only concerned with the text i.e. the tweets, so we will load this dataset and slice this dataset to make a new dataframe that contains only text column.

This dataset is pretty large so I have taken only the first 100 rows otherwise it will take a lot of time in computation.

df = pd.read_csv('Tweets.csv')

text_nlp = pd.DataFrame(df, columns=['text'])

text_nlp.head()

Textual Dataset
  1. Applying Text Profiling

Next, we will pass this data to the text profiling function where we need to mention the name of the dataframe and the column which contains text so that it can create a new dataframe that contains the Text Profile with different attributes.

profile_data = apply_text_profiling(text_nlp, 'text')

profile_data.head()

Data Profiled using NLP Profiler

Here you can see how NLP Profiler has created a new dataframe that contains 22 attributes about the text like polarity, sentiment, subjectivity, etc. This is a great way of analyzing different text data and gain useful insights.

We can also use the describe function to analyze the statistical properties of these attributes.

profile_data.describe()

Statistical Properties

Here we can different statistical properties of the textual dataset according to different attributes/properties of the data. 

Next, let us visualize some of the attributes which are created by NLP Profiler.

  1. Visualizing Attributes

Let us visualize some of the attributes created by NLP Profiler in order to get some meaningful insights and patterns.

See Also

  1. Sentiment Polarity and Sentiment Polarity Score

Sentiment polarity and polarity score will tell us how text data is divided for different sentiments like positive, negative, etc.

profile_data[‘'sentiment_polarity_score'].hist()

Sentiment Histogram

profile_data['sentiment_polarity'].hist()

Sentiment Polarity
  1. Sentiment Subjectivity

Subjectivity is used to analyze whether the text is subjective or objective.

profiled_text_dataframe['sentiment_subjectivity_summarised'].hist()

Sentiment Subjectivity
  1. Spelling Quality 

Spelling quality checks the spelling of the words that are used in the text and differentiate them accordingly.

profiled_text_dataframe['spelling_quality'].hist()

Spelling Quality
  1. Emoticons in Text

It counts the emojis that are used while writing the text.

profiled_text_dataframe['emoji_count'].hist()

Emoticons Count

Similarly, we can plot and analyze different attributes which are created by NLP Profiler.

Conclusion:

In this article, we started with loading a dataset containing tweets and passing the dataset to the NLP profiler in order to get a new dataframe on Text Profile. We saw how we can analyze the statistical properties of this dataset and finally, we created some visualizations which give us useful insights from NLP perspective. NLP Profiler is easy to use as it creates the text profile in just one line of code with insightful attributes.

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top