Processing text in such a way to extract useful information from it known as text processing. It is the textual data analysis using different tools and techniques. In order to pass the text to a machine learning model, we need to process it to find out certain important information and the numerical features about the text.
Textblob is an open-source python library for processing textual data. It performs different operations on textual data such as noun phrase extraction, sentiment analysis, classification, translation, etc.
Textblob is built on top of NLTK and Pattern also it is very easy to use and can process the text in a few lines of code. Textblob can help you start with the NLP tasks.
In this article, we will explore textblob and learn about all of its major features with this Hands-on tutorials.
Implementation:
Textblob requires certain features from NLTK, so we will start by installing both NLTK and Textblob using pip install nltk & pip install textblob.
- Importing required libraries
We will import both NLTK and textblob, and we will download certain dependencies using NLTK.
from textblob import TextBlob
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('brown')
- Text selection for Processing
We can use any text for this text processing tutorial. I have taken an article from today’s newspaper.
art = '''Among the 10 countries that have reported the highest number of case in the world, daily cases are still continuously rising in only two – India and Colombia. Other than the US and Brazil, daily cases also appear hitting a plateau in Mexico (7th spot, 480,278 cases). Russia (4th, 892,654 cases), South Africa (5th, 563,598 cases), and Chile (9th, 375,044 cases). The remaining two – Spain (10th, 370,060 cases) and Peru (7th, 483,133 cases) – managed to control outbreaks once, but are now seeing a resurgence of cases. All caseloads are from the worldometers.info dashboard. To be sure, the global Covid-19 curve has flattened twice before — first, when the Chinese outbreak peaked and the contagion was yet to reach the West; the second, when cases dropped in Europe — however, it has risen again with more ferocity both times as the virus has spread to new regions.'''
- Text Processing
We will start with different techniques of text processing but before that, we need to pass the text to the TextBlob function.
blob = TextBlob(art)
Starting with some of the basic text processing functions like finding the tags and noun phrases.
- Tags
Tags function is used to find the respective tags of the particular word which describes whether the word is a noun, adjective, etc.
blob.tags
- Noun Phrases
Noun phrases function helps us find out the noun phrases in the text given.
blob.noun_phrases
- Sentiments
Sentiment function is used to find out the polarity and subjectivity of the text. The polarity is used to check whether the text is positive or negative and subjectivity is used to check whether the text is objective or subjective.
blob.sentiment
We can use the function polarity and subjectivity to find their values individually also.
- Words
Words function split the text into words that are used in the text.
blob.words
- Sentences
Sentences function split the text into the sentences which are used to form the text.
blob.sentences
We can also find the polarity of all individual sentences using the polarity function mentioned above.
for sentence in blob.sentences:
print(sentence.sentiment.polarity)
- Singularize & Pluralize words
We can select different words from our text and can singularize and pluralize them. Similarly, we can pass any word and convert it into a singular or plural form.
word_text = blob.words
word_text[3]
word_text[3].singularize()
word_text[4].pluralize()
- Lemmatize
Lemmatize function is used to find out the lemma for the word.
word_text[3].lemmatize()
- Spell Check
Spell check function and correct function helps in checking and correcting the spelling mistakes in our sentence or word or article.
sent = TextBlob("Among the 10 countries that have reported the highest number of case in the world")
print(sent.correct())
from textblob import Word
w = Word('amog')
w.spellcheck()
- Parsing Text
By default, Textblob uses Pattern’s parser. We will parse our text using the parser function.
blob.parse()
- N-Grams
N-grams function returns a tuple of n successive words from a given text. You just need to pass the value of n in the n-gram function to decide the number of words in the n-gram.
blob.ngrams(n=5)
These are some of the text processing functions that are provided by textblob. We can use textblob for text processing as it is easy to use and has a lot of predefined functions.
Conclusion:
In this article, we have learned about Textblob and how text blob is used for text processing. Textblob provides a wide variety of functions that are used to draw certain properties of the textual data. It allows us to change the properties of data to make it useful to pass it to the machine learning model.