Sentiment Analysis is a predictive modelling task where the model is trained to predict the polarity of textual data or sentiments like Positive, Neural, and negative. Sentimental Analysis is performed by various businesses to understand their customer behaviour towards the products well. It gives them automatic feedback of the customer that helps them to take actions accordingly. Since we are already overloaded with lots of unstructured data it becomes very tough to analyze the large volume of textual data. But sentiment analysis can be very useful for businesses to label these texts. Sentimental Analysis can be done to compute feedback, reviews of the movies, etc. Even Emotion detection is like part of sentiment analysis where we can analyze the emotion of a person being happy, angry, sad, shock, etc.
Long Short Term Memory is also known as LSTM that was introduced by Hocheriter & Schmindhuber in 1997. LSTM is a type of RNN network that can grasp long term dependence. They are widely used today for a variety of different tasks like speech recognition, text classification, sentimental analysis, etc. Through this article, we will build a deep learning model using the LSTM Recurrent Neural Network that would be able to classify sentiments of the tweets.
What are Recurrent Neural Networks and Long Short Term Memory?
We have already seen feed-forward networks where inputs are multiplied by a weight and then bias is added to that and so on and finally we get output from the last layer. But the problem with these types of networks is they do not store memory and cannot be used in sequential data. Even the input and output of this type of network is fixed. We cannot use these types of networks for problems like Stock Price prediction and similar problems.
This is the reason Recurrent Neural Networks (RNN) was introduced. RNN was designed in a way such that they can catch the sequential / time series data. In RNN, we multiply with the weight associated with the input of the previous state (w1) and weight associated with output for the previous state. And then we pass them to the Tanh function to get the new state. Now to get the output vector we multiply the new state with an output of Tanh function. Deep networks are not preferred in RNN.
But RNN suffers from a vanishing gradient problem that is very significant changes in the weights that do not help the model learn. To overcome this LSTM was introduced. You can check this article that explains more about RNN and LSTM “Comparison of RNN LSTM model with Arima Models for Forecasting Problem”.
Sentiment Analysis using LSTM
Let us first import the required libraries and data. You can import the data directly from Kaggle and use it. There are also many publicly available datasets for sentiment analysis of tweets and reviews. We will use the Twitter Sentiment Data for this experiment. Use the below code to the same.
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils.np_utils import to_categorical
df = pd.read_csv("Sentiment.csv")
We will now explore the data we just imported. We will first see what all is present in the data. We have checked the different columns for that.
We will only use the tweets and their corresponding sentiments in this experiment. So we will create a new data frame that will only hold these two columns. We will also check the different sentiments present. Use the below code to the same.
new_df = df[['text','sentiment']]
Preprocessing Of Tweets
We will now preprocess the tweets by excluding unnecessary things from text and convert them to lowercase. Use the below code to perform this.
new_df = new_df[new_df.sentiment != "Neutral"]
new_df['text'] = new_df['text'].str.lower()
new_df['text'] = new_df['text'].re.sub('[^a-zA-z0-9\s]')
After this, we will define the vocabulary size that is to be used and use tokenizer to convert them into vectors. We have stored that into the X variable. Use the below code to do so.
tokenizer = Tokenizer(num_words=1500, split=' ')
X = tokenizer.texts_to_sequences(new_df['text'])
X = pad_sequences(X)
We then define the LSTM model architecture. Use the below code to define it. The network is similar to Convents networks. The only difference is we have defined two hyperparameters that are embed_dim and lstm_out. We have then compiled the model using adam optimizer and binary cross-entropy loss.
embed_dim = 128
lstm_out = 196
model = Sequential()
model.add(Embedding(vocabSize, embed_dim,input_length = 28))
model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2))
model.compile(loss = 'binary_crossentropy', optimizer='adam',metrics = ['accuracy'])
After this, we encode the sentiments using Label encoder. Use the below code to do that. We have stored the tweets into X and corresponding sentiments into Y.
from sklearn.preprocessing import LabelEncoder
Le = LabelEncoder()
y = Le.fit_transform(new_df['sentiment'])
Then we divide the data set into training and testing sets. Use the below code to do so. After which we passed the training data and validation data to the model.
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.15, random_state = 42)
model.fit(X_train, Y_train,validation_data = (X_test,y_test),epochs = 10, batch_size=32)
Now we will evaluate the model performance. Use the below code to evaluate the model.
We got 82% accuracy and loss of 0.655. Now we will make predictions for some of the data and check if the model is able to classify that or not. Use the below code to make the predictions for 5 rows.
As we can see from the above image the 4 predictions were correctly classified by the model whereas 1 misclassification was done by the model.
Through this article, I have tried to explore Sentiment Analysis using LSTM whereas you can now explore applying this type of sequential network to different problems and build new use cases. You can also explore one more experiment through this article titled “Foreign Rate Exchange Prediction using LSTM RNN Networks”.