Now Reading
Guide To Build A Simple Sentiment Analyzer Using TensorFlow-Hub

Guide To Build A Simple Sentiment Analyzer Using TensorFlow-Hub

Sentiment analysis is a part of natural language processing used to determine whether the sentiment of the data under observation is positive, negative or neutral. Usually, sentiment analysis is carried on text data to help professionals monitor and understand their brand and product sentiment across the industry and customers by taking the feedback. In addition, businesses often use it to check sentiment in social media data to understand the reputation of the brand and the type of customers. 

As the use of social media increased ever before, and customers tend to express their thought, sentiment analysis plays an important role to monitor and grow the businesses.  Automatically analysing customer feedback such as opinions in survey responses and social media conversations allows brands to learn what makes customers happy or frustrated to enhance their products and services to meet customer needs.  

Register for our Workshop on How To Start Your Career In Data Science?

For example, using sentiment analysis to analyze thousands of reviews about your product automatically will help you discover whether customers are happy about your product price and service. Sometimes you want to gauge brand sentiment on social media in real-time and over time so you can detect disgruntled customers immediately and respond as soon as possible. In this way, applications of sentiment analysis are endless. The major challenge involved is that people express opinions in a complex way, which makes understanding the subject of human opinions difficult. Rhetorical devices like sarcasm, irony and implied meaning could mislead sentiment analysis which is why cosine and focused opinions like product, book, movie and music reviews are easier to analyze.

Sentiment analysis is extremely important because it allows your businesses to understand the sentiment of your customers towards brands. By leveraging the power of sentiment analysis across social media conversations, reviews businesses can make better and insightful decisions. A good sentiment analysis engine can automatically transform raw, unstructured data into structured content, providing an overview of sentiments towards the products, services and brand. 

So in this article, we will implement a simple sentiment classifier using the Tensorflow-Hub (TF-HUB) text embedding module with reasonable baseline accuracy. The estimator is used from the tf.estimator.Estimator class TensorFlow provides many classes such as LinearRegressor to implement common and basic machine learning algorithms. 

Implementing Sentiment Analysis using TF-HUB

Import all dependencies:
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import re
import os
Getting data:

We are using a large movie review dataset, dataset contents IMDB movies reviews labelled by positivity from 1 to 10 and our task is to label data into positive and negative sentiments.  

Load all the files from the directory to the dataset;

def load_directory_and_data(directory):
  data = {}
  data['sentence'] = []
  data['sentiment'] = []
  for file_path in os.listdir(directory):
    with tf.io.gfile.GFile(os.path.join(directory, file_path),'r') as f:
      data['sentence'].append(f.read())
    data['sentiment'].append(re.match('\d+_(\d+)\.txt',file_path).group(1))
  return pd.DataFrame.from_dict(data)

Merge positive and negative examples add a polarity column, and shuffle;

def load_dataset(directory):
  pos_df = load_directory_and_data(os.path.join(directory,'pos'))
  neg_df = load_directory_and_data(os.path.join(directory,'neg'))
  pos_df['polarity'] = 1
  neg_df['polarity'] = 0
  return pd.concat([pos_df,neg_df]).sample(frac=1).reset_index(drop=True)

Download and process the files;

def download_load_dataset(force_download=False):
  dataset = tf.keras.utils.get_file(fname='aclImdb.tar.gz',
  origin='http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz', 
  extract=True)
  train_df = load_dataset(os.path.join(os.path.dirname(dataset),
                                       'aclImdb','train'))
  test_df = load_dataset(os.path.join(os.path.dirname(dataset),
                                      'aclImdb','test'))
  return train_df,test_df

Load train and test data;

train_df,test_df = download_load_dataset()
train_df.head(10)
Load the Module:

Before loading the module, we need to create an input function from which will return the pandas data frame; then it will use to feed to model; below, we are creating training input function and prediction input function; 

train_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(
    train_df, train_df['polarity'], num_epochs=None, shuffle=True)
predict_train_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(
    train_df, train_df['polarity'], shuffle=False)
predict_test_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(
    test_df, test_df['polarity'], shuffle=False)

TF-Hub provides feature columns from hub.text_embbeding_column() class that applies a given module on text. The model is responsible for processing raw text like removing punctuations and splitting into spaces. 

See Also
exploratory data analysis

embedded_text_feature_column = hub.text_embedding_column(key='sentence',
module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')

For classifying sentiments, we are using DNNClassifier from estimator class;

estimator = tf.estimator.DNNClassifier(
    hidden_units=[500,100],
    feature_columns=[embedded_text_feature_column],
    n_classes=2,
    optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.0003))

Now train the estimator with 5000 steps, 

estimator.train(input_fn=train_input_fn, steps=5000)

Loss at the final stage;

Evaluate model:
train_eval_result = estimator.evaluate(input_fn=predict_train_input_fn)
test_eval_result = estimator.evaluate(input_fn=predict_test_input_fn)
print('Training set accuracy:{accuracy}'.format(**train_eval_result))
print('Test set accuracy {accuracy}'.format(**test_eval_result))

We can also visualize the prediction using confusion matrix as below;

def get_predictions(estimator, input_fn):
  return [x['class_ids'][0] for x in estimator.predict(input_fn=input_fn)]
LABELS = ['negative', 'positive']
# create a confusion matrix on training data
cm = tf.math.confusion_matrix(train_df['polarity'],
                              get_predictions(estimator, predict_train_input_fn))
# normalize confusion matrix 
cm = tf.cast(cm, dtype=tf.float32)
cm = cm/tf.math.reduce_sum(cm,axis=1)[:,np.newaxis]
sns.heatmap(cm, annot=True, xticklabels=LABELS, yticklabels=LABELS)
plt.xlabel('predicted')
plt.ylabel('True')

Conclusion

In this article, we have seen what sentiment analysis is and how it is important to improve business and products. The main advantage of sentiment analysis is that it gives you huge amounts of unstructured data in an organized format. Further moving with the article, we saw a practical example of sentiment analysis. We have used the word embedding module from TF-hub and built a classifier from the Estimator class, and the results are quite impressive.

References 


Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top