Hands-On Guide To Using AutoNLP For Automating Sentiment Analysis

In this article, we will learn about what AutoNLP is and implement a sentiment analysis model with twitter dataset.

Automated Machine learning or autoML is used for automating the complete process of machine learning for real-world problems to make the process easier and more efficient. Over the years researchers have developed ways of automating processes by developing tools like AutoKeras, AutoSklearn and even no-coding platforms like WEKA and H2o

One such area of automation is in the field of natural language processing. With the development of AutoNLP, it is now super easy to build a model like sentiment analysis with very few basic lines of code and get a good output. With automation like these, it allows everyone to be a part of the machine learning community and does not restrict machine learning to only developers and engineers. 

In this article, we will learn about what AutoNLP is and implement a sentiment analysis model with twitter dataset. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

What is AutoNLP?

Using the concepts of AutoML, AutoNLP helps in automating the process of exploratory data analysis like stemming, tokenization, lemmatization etc. It also helps in text processing and picking the best model for the given dataset. AutoNLP was developed under AutoVIML which stands for Automatic Variant Interpretable ML. Some of the features of AutoNLP are:

  1. Data cleansing: The entire dataset can be sent to the model without performing any process like vectorization. It even fills the missing data and cleans the data automatically. 
  2. Uses feature tools library for feature extraction: Feature Tools is another great library that helps in feature engineering and extraction in any easy way. 
  3. Model performance and graphs are produced automatically: Just by setting the verbose, the model graph and performance can be shown.
  4. Feature reduction is automatic: With huge datasets, it becomes tough to select the best features and perform EDA. But this is taken care of by AutoNLP.

Implementation of AutoNLP

Let us now implement a sentiment analysis model for a twitter dataset using autoNLP. Without autoNLP, the data had to be first vectorized, stemmed and lemmatized and finally converted to a word cloud before training. But with autoNLP, all we have to do is five simple steps. 

Installing the AutoNLP

To install this we can use a simple pip command. Since AutoNLP belongs to autoviml we need to install that. 

!pip install autoviml

After installing this, we can go ahead and download the dataset for the project. I will be using the twitter dataset since we are doing sentiment analysis. You can download the dataset here. Once done, let us mount the drive and see our dataset. 

from google.colab import drive

import pandas as pd

drive.mount('/content/gdrive')

data=pd.read_csv("/content/gdrive/MyDrive/twitter_train.csv") 

twitter

Model

Now, we can use the AutoNLP and build the model. import numpy as np

from sklearn.model_selection import train_test_split

from autoviml.Auto_NLP import Auto_NLP

train, test = train_test_split(data, test_size=0.2)

Since the model is a classification type we will mention it is mentioned in the AutoNLP method. The top_num_feature, if not set will be assumed to be a value above 300 and the training becomes slower when compared to 100. 

input_feature, target = "SentimentText", "Sentiment"

train_x, test_x, final, predicted= Auto_NLP(input_feature, train, test,target,score_type="balanced_accuracy",top_num_features=100,modeltype="Classification", verbose=2, build_model=True)

Now you will see a series of graphs and within few minutes you will see the trained output. 

These graphs show in detail about the visualizations during the training process. It shows the word count, the density and character count as well. As the training progresses these graphs change and here is the final output. All the punctuations and tags are automatically removed and the density of these are also shown in the graph. 

autonlp
autonlp
autonlp

The model has selected multinomial NB as a classifier and has performed the training. If the top_num_features were not given, a random forest algorithm would be used. 

The final output is as shown below.

To understand the pipelining process just print the final value and you will see the following.

sentiment analysis

Finally, you can make predictions as follows.

final.predict(test_x[input_feature])

Conclusion

We saw how using AutoNLP made the model building very easy for performing sentiment analysis. Not only this but it also automatically pre-processed the data and gave visualizations for different aspects of the dataset. Thus, automation makes it easy to build even complex models.

Bhoomika Madhukar
I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR