Now Reading
Hands-On Guide To Using AutoNLP For Automating Sentiment Analysis

Hands-On Guide To Using AutoNLP For Automating Sentiment Analysis

Bhoomika Madhukar
W3Schools

Automated Machine learning or autoML is used for automating the complete process of machine learning for real-world problems to make the process easier and more efficient. Over the years researchers have developed ways of automating processes by developing tools like AutoKeras, AutoSklearn and even no-coding platforms like WEKA and H2o

One such area of automation is in the field of natural language processing. With the development of AutoNLP, it is now super easy to build a model like sentiment analysis with very few basic lines of code and get a good output. With automation like these, it allows everyone to be a part of the machine learning community and does not restrict machine learning to only developers and engineers. 

In this article, we will learn about what AutoNLP is and implement a sentiment analysis model with twitter dataset. 



What is AutoNLP?

Using the concepts of AutoML, AutoNLP helps in automating the process of exploratory data analysis like stemming, tokenization, lemmatization etc. It also helps in text processing and picking the best model for the given dataset. AutoNLP was developed under AutoVIML which stands for Automatic Variant Interpretable ML. Some of the features of AutoNLP are:

  1. Data cleansing: The entire dataset can be sent to the model without performing any process like vectorization. It even fills the missing data and cleans the data automatically. 
  2. Uses feature tools library for feature extraction: Feature Tools is another great library that helps in feature engineering and extraction in any easy way. 
  3. Model performance and graphs are produced automatically: Just by setting the verbose, the model graph and performance can be shown.
  4. Feature reduction is automatic: With huge datasets, it becomes tough to select the best features and perform EDA. But this is taken care of by AutoNLP.

Implementation of AutoNLP

Let us now implement a sentiment analysis model for a twitter dataset using autoNLP. Without autoNLP, the data had to be first vectorized, stemmed and lemmatized and finally converted to a word cloud before training. But with autoNLP, all we have to do is five simple steps. 

Installing the AutoNLP

To install this we can use a simple pip command. Since AutoNLP belongs to autoviml we need to install that. 

!pip install autoviml

After installing this, we can go ahead and download the dataset for the project. I will be using the twitter dataset since we are doing sentiment analysis. You can download the dataset here. Once done, let us mount the drive and see our dataset. 

from google.colab import drive

import pandas as pd

drive.mount('/content/gdrive')

data=pd.read_csv("/content/gdrive/MyDrive/twitter_train.csv") 

twitter

Model

Now, we can use the AutoNLP and build the model. import numpy as np

from sklearn.model_selection import train_test_split

from autoviml.Auto_NLP import Auto_NLP

train, test = train_test_split(data, test_size=0.2)

Since the model is a classification type we will mention it is mentioned in the AutoNLP method. The top_num_feature, if not set will be assumed to be a value above 300 and the training becomes slower when compared to 100. 

input_feature, target = "SentimentText", "Sentiment"

See Also
BigBird - Google’s ‘Brahmastra’ For NLP Supremacy?

train_x, test_x, final, predicted= Auto_NLP(input_feature, train, test,target,score_type="balanced_accuracy",top_num_features=100,modeltype="Classification", verbose=2, build_model=True)

Now you will see a series of graphs and within few minutes you will see the trained output. 

These graphs show in detail about the visualizations during the training process. It shows the word count, the density and character count as well. As the training progresses these graphs change and here is the final output. All the punctuations and tags are automatically removed and the density of these are also shown in the graph. 

autonlp
autonlp
autonlp

The model has selected multinomial NB as a classifier and has performed the training. If the top_num_features were not given, a random forest algorithm would be used. 

The final output is as shown below.

To understand the pipelining process just print the final value and you will see the following.

sentiment analysis

Finally, you can make predictions as follows.

final.predict(test_x[input_feature])

Conclusion

We saw how using AutoNLP made the model building very easy for performing sentiment analysis. Not only this but it also automatically pre-processed the data and gave visualizations for different aspects of the dataset. Thus, automation makes it easy to build even complex models.

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top