Now Reading
Guide To Pysentimiento Toolkit | Text Classification Using Transformers

Guide To Pysentimiento Toolkit | Text Classification Using Transformers

As the word sentimiento means feeling in English, pysentimiento is a python toolkit for sentiment analysis and text classification. To make a model for sentiment analysis, we need to take care of model type, seek the best hyper-parameter tuning, fit the data into the model, train and test the model. Pysentimiento comes to save us from all these hard-working processes. Pysentimiento is the best way to perform text classification and sentiment analysis. The best thing is that it has two features that we can use, we can analyze the text in two languages(English and Spanish) with a single module, and also we have the option to perform preprocessing of the text. Before going to the code work, let’s just have some overview about the hugging face and transformers because it is a Transformer-based library.

Hugging Face

Hugging Face is one of the biggest startups for generating different packages and modules in the NLP section of data science. Today, many big companies like Apple, Manzo, Bing etc. are producing such amazing features for users, using hugging face libraries. Hugging Face is an open-source community with transform libraries which all are backed up by PyTorch and TensorFlow. These libraries provide thousands of pretrained models to tune them according to our requirements.            

Register for our Workshop on How To Start Your Career In Data Science?

Transformers

Transformers are part of the hugging face repositories. They are one of the most used repositories from hugging, which provides us thousands of pretrained models and APIs to     quickly download and use those models to get better results using our datasets. Pysentimiento is a kind of model for text classification provided by transformers. Transformers are mainly focused on natural language processing. Some of the models provided by transformers are very easy and reliable for performing NLP tasks like classification, information extraction, question answering, translation in more than 100 languages. 

For more information on hugging face and transformers, you can refer to these articles-

Let’s just move towards the pysentimiento toolkit. We will be going to see how we can perform some basic operations of text classification and sentiment analysis using pysentimiento toolkit, a transformer-based library using google colab.

Code Implementation of Pysentimiento 

Installation of the package can be done by.

!pip install pysentimiento

Let’s try to do some sentiment analysis in English.

Importing sentiment analyzer of pysetimiento.

input:

  from pysentimiento import SentimentAnalyzer
 sentiment_analyzer_en = SentimentAnalyzer(lang="en") 

Output:

Predicting a positive nature sentence:

Defining a positive sentence 

Input:

sentence ='i love analytics india magazine'

Input:

sentiment_analyzer_en.predict(sentence)

Output :

Here we can see that the sentiment analyzer has analyzed that the sentence is positive with a probability of 99.4 %, which is quite satisfying. Let’s move into some more results.

Input :

 sentence ='i like analytics india magazine'
 sentiment_analyzer_en.predict(sentence) 

Output:

Here in the above inputs, we can see that we have replaced love with like. In the predictions, we can see the significant changes; for the word love in a sentence, the toolkit has predicted 99.4% positivity. For like word, it has predicted the positivity is 98.7%. On the other hand, there are changes in the neutral nature but not in the negative nature. 

Input:

 sentence = 'i love pizza but i hate broccoli in pizza'
 sentiment_analyzer_en.predict(sentence) 

Output:

Here we can see the sequencing of the training data. We have all the words in a single line, but the model understands the weightage of love is more than the weightage of hate in the sentence and predicts accordingly.

Let’s try the SentimentAnalyzer in the Spanish language.

Making an object of sentiment analyzer of pysentimiento.

Input:

sentiment_analyzer_es = SentimentAnalyzer(lang="es")

Predicting the nature of the sentence ‘i love you’ in Spanish.

Input:

 sentence = 'te amo oraciones'
 sentiment_analyzer_es.predict(sentence) 

Output:

Here we can see that the prediction for the Spanish language is also good. Let’s have some more tests.

Input:

 sentence = 'te odio oraciones'
 sentiment_analyzer_es.predict(sentence) 

Output:

Here in the above input, we have given “I like you” as a sentence.

Input:

 sentence = 'te odio oraciones'
 sentiment_analyzer_es.predict(sentence) 

Output:

Here in the above input, I have provided “i hate you” as a sentence, and we can see in the results the sentiment analyzer is working fine. Instead of a sentiment analyzer this toolkit provides us with an emotion analyzer as well. We can use it in any project where we need to do some emotional analysis of the text. Let’s look at how we can make it work and how the results it can provide.

 Importing the package.

Input:

 from pysentimiento import EmotionAnalyzer
 emotion_analyzer = EmotionAnalyzer(lang="en") 

Output:

Predicting results for an emotion

Input:

 emotion = 'yessssss'
 emotion_analyzer_en.predict(emotion) 

Output:

EmotionOutput(output=joy, probas={joy: 0.515, surprise: 0.211, others: 0.210, anger: 0.019, fear: 0.018, sadness: 0.014, disgust: 0.013})

Here we can see the results telling us the emotion of yessssss word can be joy and tell us that it can also be something else. In the emotion analysis, the best-suggested factor is that we should perform it in audio data because we don’t know how a person is saying it. Still, the performance of the toolkit is also very satisfying. Let’s have some more tests also.

Input:

 emotion = 'yeah, we won the match have celebration'
 emotion_analyzer_en. predict(emotion) 

Output:

EmotionOutput(output=joy, probas={joy: 0.898, others: 0.066, surprise: 0.012, disgust: 0.006, sadness: 0.006, fear: 0.006, anger: 0.005})

Here in the output, we can see that for a whole sentence where we defined the nature of the sentence joyfull, the analyzer predicted the emotion output as a joy.

By those results, we can say it is a well-trained model and can be useful for us.

This feature can also be used for Spanish language texts. So let’s perform the emotion analysis for the Spanish language too.

Importing the package. 

Input:

See Also
Big Tech & Their Favourite Deep Learning Techniques

emotion_analyzer_es = EmotionAnalyzer(lang="es")

Defining and predicting the emotions of sentences.

input:

emotion_analyzer_es.predict('te amo')

Output:

EmotionOutput(output=joy, probas={joy: 0.982, others: 0.013, surprise: 0.002, sadness: 0.002, fear: 0.001, disgust: 0.001, anger: 0.000})

Here in the output, we can see that the results are satisfying for the word. Now in the next step, we will use a sentence for emotion analysis using the pysentimento toolkit.

Input:

emotion_analyzer_es.predict('Nosotros somos asombrosos')

Output:

EmotionOutput(output=joy, probas={joy: 0.896, surprise: 0.061, others: 0.039, sadness: 0.001, fear: 0.001, disgust: 0.001, anger: 0.001})

Here in the input, we have provided a sentence that means “we are amazing” in English and can see the predictions that the emotion of the sentence is joy.

With the package pysentimiento we get one more feature that can be used for text preprocessing, especially for the tweets where we can classify username, URL, repeated characters and hashtags or special characters from the text from a tweet. This feature is called preprocess_tweet. So let’s try this for some tweets.

Importing the preprocess_tweet package

Input:

from pysentimiento.preprocessing import preprocess_tweet

Replacing username and url from a tweet.

Input:

 tweet = "@yugeshverma i need to give a tweet https://www.google.com"
 preprocess_tweet(tweet) 

Output:

Here we can see how easily it was processed. First, let’s check the processor for repeated characters and hashtags in a tweet.

Input:

preprocess_tweet("yessssssss this is a tweet #AIM")

Output:

Here we can see how it extracted the hashtag value and repeated character from the tweet.

This feature can also handle the emojis in tweets or text. So let’s check how it can do that.

Input:

preprocess_tweet("happy birthday 🎉🎉", lang = 'en')

Output:

Here we can see that he classified in the text where the emojis are and what the emojis are named. So let’s put all the things together in one tweet and process it.

 Input:

 tweet = "@Analyticsindiamag we are having goooood #content in https://analyticsindiamag.com/ 🎉🎉"
 preprocess_tweet(tweet, lang = 'en') 

Output:

Here in the output, we can see that we have good enough results for text processing also.

So overall, we can say that it’s a good pretrained model, and the results are good enough. It made it very easy to perform text analysis with such good results. Nowadays, it is suggested to use transfer learning for data science projects and, as we have seen, the performance of this transformer-based model. We can use them for our text classification projects because the model provides all the features required in text classification or natural language processing under one roof and is very easy to use. So I encourage you to use this more and more this package for knowing more different features it. 

References


Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top