Complete Guide On NLP Profiler: Python Tool For Profiling of Textual Dataset

NLP Profiler is a simple NLP library which works on profiling of textual datasets with one one more text columns.
NLP Profiler Banner

Natural Language Processing is a subfield of Artificial Intelligence that works on making the human language understandable to the machine/computer. NLP has different functionalities that work on the textual data and find out useful insights and information. NLP can practically be used for Speech Recognition, creating voice search engines, etc. NLP can be used to perform a large variety of operations on text data like tokenizing, lamenting, stemming POS tagging, etc.

NLP Profiler is a simple NLP library which works on profiling of textual datasets with one one more text columns. Basically NLP profilers provide us with high-level insights about the data along with the statistical properties of the data. It works the same way as pandas.describe() works for pandas dataframe for statistical properties.

It takes the textual data as input with at least one column with text data and returns a dataframe which contains useful insights about the data like sentiment analysis, the subjectivity of data, etc. NLP profiler is in its early stage and is continuously improving. 

In this article, we will explore what are the different functionalities that are there in NLP profiler and implement them in order to gain useful insights from the data.

Implementation:

NLP Profiler can be installed using the git repository where it is hosted. Before Installing it you need to download and install the git version according to your operating system. After git is installed we can install NLP Profiler by running the below-given command in the command prompt.

pip install git+https://github.com/neomatrix369/nlp_profiler.git@master

  1. Importing required libraries

We will load the data using pandas so we will import pandas and for creating the data profile we will import the NLP profiler.

import pandas as pd

from nlp_profiler.core import apply_text_profiling

  1. Loading the dataset

We need a textual dataset in order to explore NLP profiler, here I have used a dataset containing tweets which can be downloaded from Kaggle. This dataset contains different attributes like tweets, usernames, etc. But are only concerned with the text i.e. the tweets, so we will load this dataset and slice this dataset to make a new dataframe that contains only text column.

This dataset is pretty large so I have taken only the first 100 rows otherwise it will take a lot of time in computation.

df = pd.read_csv('Tweets.csv')

text_nlp = pd.DataFrame(df, columns=['text'])

text_nlp.head()

Textual Dataset
  1. Applying Text Profiling

Next, we will pass this data to the text profiling function where we need to mention the name of the dataframe and the column which contains text so that it can create a new dataframe that contains the Text Profile with different attributes.

profile_data = apply_text_profiling(text_nlp, 'text')

profile_data.head()

Data Profiled using NLP Profiler

Here you can see how NLP Profiler has created a new dataframe that contains 22 attributes about the text like polarity, sentiment, subjectivity, etc. This is a great way of analyzing different text data and gain useful insights.

We can also use the describe function to analyze the statistical properties of these attributes.

profile_data.describe()

Statistical Properties

Here we can different statistical properties of the textual dataset according to different attributes/properties of the data. 

Next, let us visualize some of the attributes which are created by NLP Profiler.

  1. Visualizing Attributes

Let us visualize some of the attributes created by NLP Profiler in order to get some meaningful insights and patterns.

  1. Sentiment Polarity and Sentiment Polarity Score

Sentiment polarity and polarity score will tell us how text data is divided for different sentiments like positive, negative, etc.

profile_data[‘'sentiment_polarity_score'].hist()

Sentiment Histogram

profile_data['sentiment_polarity'].hist()

Sentiment Polarity
  1. Sentiment Subjectivity

Subjectivity is used to analyze whether the text is subjective or objective.

profiled_text_dataframe['sentiment_subjectivity_summarised'].hist()

Sentiment Subjectivity
  1. Spelling Quality 

Spelling quality checks the spelling of the words that are used in the text and differentiate them accordingly.

profiled_text_dataframe['spelling_quality'].hist()

Spelling Quality
  1. Emoticons in Text

It counts the emojis that are used while writing the text.

profiled_text_dataframe['emoji_count'].hist()

Emoticons Count

Similarly, we can plot and analyze different attributes which are created by NLP Profiler.

Conclusion:

In this article, we started with loading a dataset containing tweets and passing the dataset to the NLP profiler in order to get a new dataframe on Text Profile. We saw how we can analyze the statistical properties of this dataset and finally, we created some visualizations which give us useful insights from NLP perspective. NLP Profiler is easy to use as it creates the text profile in just one line of code with insightful attributes.

Download our Mobile App

Himanshu Sharma
An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.