Now Reading
Getting Started With Sentiment Analysis Using TensorFlow Keras

Getting Started With Sentiment Analysis Using TensorFlow Keras

sentiment analysis

Natural Language Processing is one of the artificial intelligence tasks performed with natural languages. The word ‘natural’ refers to the languages that evolved naturally among humans for communication. A long-standing goal in artificial intelligence is to make a machine effectively communicate with humans. Language modeling and Language generation (such as neural machine translation) have been popular among researchers for over a decade. For an AI beginner, learning and practicing Natural Language Processing can be initialized with classification of texts. Sentiment Analysis is among the text classification applications in which a given text is classified into a positive class or a negative class (sometimes, a neutral class, too) based on the context. This article discusses sentiment analysis using TensorFlow Keras with the IMDB movie reviews dataset, one of the famous Sentiment Analysis datasets.

TensorFlow’s Keras API offers the complete functionality required to build and execute a deep learning model. This article assumes that the reader is familiar with the basics of deep learning and Recurrent Neural Networks (RNNs). Nevertheless, the following articles may yield a good understanding of deep learning and RNNs:

REGISTER FOR OUR UPCOMING ML WORKSHOP

Create the Environment

Create the necessary Python environment by importing the frameworks and libraries.

 # for array operations
 import numpy as np
 # deep learning framework
 import tensorflow as tf
 # to obtain IMDB datasets
 import tensorflow_datasets as tfds
 # Keras API
 from tensorflow import keras
 # import required layers
 from tensorflow.keras.layers import Dense, Dropout, Bidirectional, LSTM
 # to visualize the performance
 import matplotlib.pyplot as plt 

Download the IMDB dataset 

IMDB reviews dataset is available with TensorFlow Datasets in different variants: 

  1. Plain text reviews, 
  2. Byte-encoded texts, 
  3. Integer-encoded texts with around 8k vocabulary
  4. Integer-encoded texts with around 32k vocabulary

Here, we use the dataset that has integer-encoded texts with around 8k vocabulary words.

 data, meta = tfds.load('imdb_reviews/subwords8k',
                       with_info = True,
                       as_supervised = True) 

Output:

imdb data download

What data are downloaded?

data.keys()

Output:

We do not require unsupervised data. Hence, we can obtain two datasets for train and test sets.

 train = data['train']
 test = data['test']
 train, test 

Output:

It can be observed that both texts and labels are integers. Moreover, texts are not of fixed length (since, no size is mentioned). The data is already preprocessed and encoded and is ready to use.

Prepare an Encoder

We have discussed that the dataset comes with texts being encoded into integers. Encoding into integers is mandatory since machines can read only numbers. However, humans can not read those integer texts. Hence, we need a decoder that can reverse the encoding action, by which we can convert the numbers into text and read in English. We need an encoder that can convert an example text (from outside of the dataset) into integers. 

Metadata that comes with the dataset contains the encoder originally used while preparing the dataset. It can perform encoding and decoding operations.

meta.features

Output:

It can be observed that metadata contains the encoder under the key ‘text’.

 # extract the encoder
 encoder = meta.features['text'].encoder 

The encoded integers will be numbered from 1 to vocabulary size. How many vocabulary words are there in the encoder?

encoder.vocab_size

 Output:

What are the original text words?

print(encoder.subwords)

A portion of the output: 

decoded text

Test the encoder by sampling a sentence, encoding it into integers, and decoding back into text.

 example = 'Analytics India Magazine !'
 enc = encoder.encode(example)
 enc 

Output:

encoded integers

We have provided a sentence with three words and one exclamation mark, but it is encoded into an eleven-element integer list. The split words are technically called tokens. Let’s explore the numbers and corresponding tokens by using the decode method.

 for integer in enc:
     text = encoder.decode([integer])
     print('%4d : %s'%(integer, text)) 

Output:

Preprocess the Dataset

The input texts are of variable lengths. But a deep learning model can not accept inputs of different sizes. We have to fix the length of each input token. If there are fewer tokens than fixed length, the vector will be made up by padding with zeros. It is accomplished by using the padded_batch method. It pads the sequences in a batch to have an equal number of sequence lengths. Since the large vocabulary size will make the manipulations complicated; it should be embedded into a small-sized vector representation. We perform this process with an Embedding layer.

 BUFFER_SIZE = 10000
 BATCH_SIZE = 64
 AUTOTUNE = tf.data.AUTOTUNE
 train_data = train.shuffle(BUFFER_SIZE)
 train_data = train_data.padded_batch(BATCH_SIZE, padded_shapes=([None],[]))
 train_data = train_data.prefetch(AUTOTUNE)
 test_data = test.padded_batch(BATCH_SIZE, padded_shapes=([None],[]))
 embed_layer = keras.layers.Embedding(encoder.vocab_size, 64) 

Build the Model

Unlike images and structured data, texts have a sequential order of tokens that contribute to the context. Hence, the deep learning model should be able to remember past tokens in order when processing a specific token. This is achieved by implementing either Recurrent Neural Networks or Transformers. Here, we prefer Recurrent Neural Networks with LSTM units to model our problem. LSTM (Long-Short Term Memory) units capture the temporal relationship of the past portion of the embedded sequence in memory and models the sequential relationships among texts. LSTM units can be modeled with bi-directional layers so that the model can understand the context of a sentence in both directions, namely, left-to-right and right-to-left. 

 model = keras.Sequential([
     # embedding layer
     embed_layer,
     # bidirectional LSTM layers
     Bidirectional(LSTM(64, 
                        dropout=0.5, 
                        recurrent_dropout=0.5, 
                        return_sequences=True)),
     Bidirectional(LSTM(32, 
                        dropout=0.5, 
                        recurrent_dropout=0.5, 
                        return_sequences=True)),
     Bidirectional(LSTM(16, 
                        dropout=0.5, 
                        recurrent_dropout=0.5)),
     # Classification head
     Dense(64, activation='relu', kernel_regularizer='l2'),
     Dropout(0.5),
     Dense(1, activation='sigmoid')    
 ]) 

We have used dropout layers and kernel regularizer to contain the overfitting of the model. In the LSTM layer, dropout is executed in two stages, one for the input data and another for the recurrent temporal data.

How many parameters does the model have?

model.summary()

Output:

model parameters

Plotting the model gives a better understanding of data flow through layers.

See Also

keras.utils.plot_model(model, show_shapes=True, dpi=48)

Output:

model plot

Train the Model

Compile the built model with Adam optimizer, Accuracy metric and Binary Cross-entropy loss function.

 model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy']) 

Train the model for 2 epochs. It should be noted that model training may take more time than multi-layer perceptrons (MLPs) and CNNs, because of handling temporal relationships in LSTM layers.

 history = model.fit(train_data, 
                     validation_data=test_data, 
                     epochs=2) 

Output:

model training

Training for two epochs has taken more than 7 hours on CPU runtime in a virtual machine with 12GB RAM. A runtime with GPU or TPU will not help reduce the training time, because these accelerating runtimes are designed exclusively for image processing networks, such as convolutional neural networks. This is one of the reasons people opt for pre-trained models, such as BERT, for deployment.

Model Performance Evaluation

The model has been trained and is ready to make inferences. Plot the training losses to have a better understanding of its performance.

 hist = history.history
 plt.plot(hist['loss'])
 plt.plot(hist['val_loss'])
 plt.legend(labels=['Training', 'Validation'])
 plt.xlabel('Epochs')
 plt.ylabel('Loss')
 plt.show() 

Output:

model performance

Training Loss goes down in two epochs. But training for more epochs may help the model to reduce the losses and learn the pattern better.

Model Inference – Sentiment Analysis

Sample prediction on three synthetic reviews

 # Sample prediction
 samples = ['The plot is fantastic', 
            'The movie was cool and thrilling', 
            'one of the worst films I have ever seen']

 # encode into integers
 sample_encoded = [encoder.encode(sample) for sample in samples]

 # pad with zeros to have same length 
 sample_padded = []
 for s in sample_encoded:
     pad_length = 128 - len(s)
     zeros = [0]*pad_length
     s.extend(zeros)
     s = tf.convert_to_tensor(s)
     sample_padded.append(s)
 # convert into tensor before feeding the model
 sample_padded = tf.convert_to_tensor(sample_padded)

 #make predictions
 predictions = model.predict(sample_padded)
 predictions 

Output:

sentiment analysis results

Prediction above 0.5 refers to a positive review and below 0.5 refers to a negative review.

 print('Predictions on sample test reviews... \n')
 for i in range(len(samples)):
     pred = predictions[i][0]
     sentiment = 'positive' if pred>0.5 else 'negative'
     print('%40s : %s'%(samples[i], sentiment)) 

Output:

sentiment inference

This Notebook carries the above code implementation.

Wrapping Up

In this article, we have discussed sentiment analysis with text data. We have learnt hands-on TensorFlow implementation of sentiment analysis with the large IMDB movie review dataset. We have processed the data by padding it, embedding it, building an RNN model with bidirectional LSTM layers, and trained the model. Finally, we have evaluated the model by predicting some sample movie reviews.

References and Further Reading

What Do You Think?

Join Our Telegram Group. Be part of an engaging online community. Join Here.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top