Hands-On Guide To Using AutoNLP For Automating Sentiment Analysis

In this article, we will learn about what AutoNLP is and implement a sentiment analysis model with twitter dataset.

Automated Machine learning or autoML is used for automating the complete process of machine learning for real-world problems to make the process easier and more efficient. Over the years researchers have developed ways of automating processes by developing tools like AutoKeras, AutoSklearn and even no-coding platforms like WEKA and H2o

One such area of automation is in the field of natural language processing. With the development of AutoNLP, it is now super easy to build a model like sentiment analysis with very few basic lines of code and get a good output. With automation like these, it allows everyone to be a part of the machine learning community and does not restrict machine learning to only developers and engineers. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

In this article, we will learn about what AutoNLP is and implement a sentiment analysis model with twitter dataset. 

What is AutoNLP?

Using the concepts of AutoML, AutoNLP helps in automating the process of exploratory data analysis like stemming, tokenization, lemmatization etc. It also helps in text processing and picking the best model for the given dataset. AutoNLP was developed under AutoVIML which stands for Automatic Variant Interpretable ML. Some of the features of AutoNLP are:

  1. Data cleansing: The entire dataset can be sent to the model without performing any process like vectorization. It even fills the missing data and cleans the data automatically. 
  2. Uses feature tools library for feature extraction: Feature Tools is another great library that helps in feature engineering and extraction in any easy way. 
  3. Model performance and graphs are produced automatically: Just by setting the verbose, the model graph and performance can be shown.
  4. Feature reduction is automatic: With huge datasets, it becomes tough to select the best features and perform EDA. But this is taken care of by AutoNLP.

Implementation of AutoNLP

Let us now implement a sentiment analysis model for a twitter dataset using autoNLP. Without autoNLP, the data had to be first vectorized, stemmed and lemmatized and finally converted to a word cloud before training. But with autoNLP, all we have to do is five simple steps. 

Installing the AutoNLP

To install this we can use a simple pip command. Since AutoNLP belongs to autoviml we need to install that. 

!pip install autoviml

After installing this, we can go ahead and download the dataset for the project. I will be using the twitter dataset since we are doing sentiment analysis. You can download the dataset here. Once done, let us mount the drive and see our dataset. 

from google.colab import drive

import pandas as pd

drive.mount('/content/gdrive')

data=pd.read_csv("/content/gdrive/MyDrive/twitter_train.csv") 

twitter

Model

Now, we can use the AutoNLP and build the model. import numpy as np

from sklearn.model_selection import train_test_split

from autoviml.Auto_NLP import Auto_NLP

train, test = train_test_split(data, test_size=0.2)

Since the model is a classification type we will mention it is mentioned in the AutoNLP method. The top_num_feature, if not set will be assumed to be a value above 300 and the training becomes slower when compared to 100. 

input_feature, target = "SentimentText", "Sentiment"

train_x, test_x, final, predicted= Auto_NLP(input_feature, train, test,target,score_type="balanced_accuracy",top_num_features=100,modeltype="Classification", verbose=2, build_model=True)

Now you will see a series of graphs and within few minutes you will see the trained output. 

These graphs show in detail about the visualizations during the training process. It shows the word count, the density and character count as well. As the training progresses these graphs change and here is the final output. All the punctuations and tags are automatically removed and the density of these are also shown in the graph. 

autonlp
autonlp
autonlp

The model has selected multinomial NB as a classifier and has performed the training. If the top_num_features were not given, a random forest algorithm would be used. 

The final output is as shown below.

To understand the pipelining process just print the final value and you will see the following.

sentiment analysis

Finally, you can make predictions as follows.

final.predict(test_x[input_feature])

Conclusion

We saw how using AutoNLP made the model building very easy for performing sentiment analysis. Not only this but it also automatically pre-processed the data and gave visualizations for different aspects of the dataset. Thus, automation makes it easy to build even complex models.

More Great AIM Stories

Bhoomika Madhukar
I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.