Hands-On Guide To Using AutoNLP For Automating Sentiment Analysis

In this article, we will learn about what AutoNLP is and implement a sentiment analysis model with twitter dataset.

Automated Machine learning or autoML is used for automating the complete process of machine learning for real-world problems to make the process easier and more efficient. Over the years researchers have developed ways of automating processes by developing tools like AutoKeras, AutoSklearn and even no-coding platforms like WEKA and H2o

One such area of automation is in the field of natural language processing. With the development of AutoNLP, it is now super easy to build a model like sentiment analysis with very few basic lines of code and get a good output. With automation like these, it allows everyone to be a part of the machine learning community and does not restrict machine learning to only developers and engineers. 

In this article, we will learn about what AutoNLP is and implement a sentiment analysis model with twitter dataset. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

What is AutoNLP?

Using the concepts of AutoML, AutoNLP helps in automating the process of exploratory data analysis like stemming, tokenization, lemmatization etc. It also helps in text processing and picking the best model for the given dataset. AutoNLP was developed under AutoVIML which stands for Automatic Variant Interpretable ML. Some of the features of AutoNLP are:

  1. Data cleansing: The entire dataset can be sent to the model without performing any process like vectorization. It even fills the missing data and cleans the data automatically. 
  2. Uses feature tools library for feature extraction: Feature Tools is another great library that helps in feature engineering and extraction in any easy way. 
  3. Model performance and graphs are produced automatically: Just by setting the verbose, the model graph and performance can be shown.
  4. Feature reduction is automatic: With huge datasets, it becomes tough to select the best features and perform EDA. But this is taken care of by AutoNLP.

Implementation of AutoNLP

Let us now implement a sentiment analysis model for a twitter dataset using autoNLP. Without autoNLP, the data had to be first vectorized, stemmed and lemmatized and finally converted to a word cloud before training. But with autoNLP, all we have to do is five simple steps. 


Download our Mobile App



Installing the AutoNLP

To install this we can use a simple pip command. Since AutoNLP belongs to autoviml we need to install that. 

!pip install autoviml

After installing this, we can go ahead and download the dataset for the project. I will be using the twitter dataset since we are doing sentiment analysis. You can download the dataset here. Once done, let us mount the drive and see our dataset. 

from google.colab import drive

import pandas as pd

drive.mount('/content/gdrive')

data=pd.read_csv("/content/gdrive/MyDrive/twitter_train.csv") 

twitter

Model

Now, we can use the AutoNLP and build the model. import numpy as np

from sklearn.model_selection import train_test_split

from autoviml.Auto_NLP import Auto_NLP

train, test = train_test_split(data, test_size=0.2)

Since the model is a classification type we will mention it is mentioned in the AutoNLP method. The top_num_feature, if not set will be assumed to be a value above 300 and the training becomes slower when compared to 100. 

input_feature, target = "SentimentText", "Sentiment"

train_x, test_x, final, predicted= Auto_NLP(input_feature, train, test,target,score_type="balanced_accuracy",top_num_features=100,modeltype="Classification", verbose=2, build_model=True)

Now you will see a series of graphs and within few minutes you will see the trained output. 

These graphs show in detail about the visualizations during the training process. It shows the word count, the density and character count as well. As the training progresses these graphs change and here is the final output. All the punctuations and tags are automatically removed and the density of these are also shown in the graph. 

autonlp
autonlp
autonlp

The model has selected multinomial NB as a classifier and has performed the training. If the top_num_features were not given, a random forest algorithm would be used. 

The final output is as shown below.

To understand the pipelining process just print the final value and you will see the following.

sentiment analysis

Finally, you can make predictions as follows.

final.predict(test_x[input_feature])

Conclusion

We saw how using AutoNLP made the model building very easy for performing sentiment analysis. Not only this but it also automatically pre-processed the data and gave visualizations for different aspects of the dataset. Thus, automation makes it easy to build even complex models.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Bhoomika Madhukar
I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.

Our Upcoming Events

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

The Great Indian IT Reshuffling

While both the top guns of TCS and Tech Mahindra are reflecting rather positive signs to the media, the reason behind the resignations is far more grave.

OpenAI, a Data Scavenging Company for Microsoft

While it might be true that the investment was for furthering AI research, this partnership is also providing Microsoft with one of the greatest assets of this digital age, data​​, and—perhaps to make it worse—that data might be yours.