Text Processing mainly requires Natural Language Processing( NLP), which is processing the data in a useful way so that the machine can understand the Human Language with the help of an application or product. Using NLP we can derive some information from the textual data such as sentiment, polarity, etc. which are useful in creating text processing based applications.
Python provides different open-source libraries or modules which are built on top of NLTK and helps in text processing using NLP functions. Different libraries have different functionalities that are used on data to gain meaningful results. One such Library is Pattern.
Pattern is an open-source python library and performs different NLP tasks. It is mostly used for text processing due to various functionalities it provides. Other than text processing Pattern is used for Data Mining i.e we can extract data from various sources such as Twitter, Google, etc. using the data mining functions provided by Pattern.
In this article, we will try and cover the following points:
- NLP Functionalities of Pattern
- Data Mining Using Pattern
Implementation:
We will start by installing Pattern using the pip install pattern.
- Importing required library
Different functionalities are defined under different functions we will import them as and when required as we move ahead in this article. We will be working in the English language so we will be using ‘en’ for the English module. Let us start with some basic functionalities of Pattern for NLP operations
- NLP Operations using Pattern
We will go through some of the most used and most important functionalities which are provided by Pattern. Starting with parsing a sentence.
a. Parsing
from pattern.en import parse
parse('Hello Everyone and Welcome to Analytics India Magazine')
Here we can see the output of the parse function differentiate the words in the sentence as a noun, verb, subject, or subject. We can also use the ‘pprint’ function defined in the pattern library to display the parsed sentence in a clear manner. Also, we can set different parameters for parses such as lemmata, tokenize, encoding, etc. All these parameters can be used in parsing only so that we do not have to use a separate function for different properties.
from pattern.en import pprint
pprint(parse('Hello Everyone and Welcome to Analytics India Magazine', relations = True,tokenize= True, lemmata= True))
b. N-Grams
N-Gram function is used to find all the n-grams in a given text string.
from pattern.en import ngrams
print(ngrams("Hello Everyone and Welcome to Analytics India Magazine", n=3))
- Sentiment Analysis
Sentiment function tries to identify the opinion or view that is held by the particular text string. Sentiment function returns both polarity and the subjectivity of the given text. The Polarity value ranges between 1(Highly Positive) to -1(Highly Negative) and subjectivity value ranges between 0(Objective) to 1(Subjective).
from pattern.en import sentiment
print(sentiment("He is a good boy but sometimes he behaves miserably"))
We can see that the sentiment analysis says that the sentence is negative with high subjectivity.
- Modality
Modality is one such function that makes it different from other python libraries based on NLP. The modality function is used to find the degree of certainty in a particular sentence. Its value ranges from -1 to 1. As defined in the Pattern library we can state that a sentence with a modality of 0.5 and above can be stated as a fact.
from pattern.en import modality
text = parse('He is a good boy but sometimes he behaves miserably')
text= Sentence(text)
print(modality(text))
The modality comes out to be zero which means that the sentence is neutral.
- Suggest
Suggest function is used for spelling corrections but it is more than that. It not only checks the spelling it also gives you suggestions of what might be the correct word with their probabilities. This function also distinguishes pattern from other libraries.
from pattern.en import suggest
print(suggest("Heroi"))
- Quantify
Quantify function is used to provide a word count estimation of the words given.
from pattern.en import quantify
a = quantify(['Pencil', 'Pencil', 'Eraser', 'Sharpener', 'Sharpener', 'Sharpener', 'Scale', 'Compass'])
print(a)
- Data Mining using Pattern
One of the most important features of Pattern is that it can be used for data mining through different platforms like Google, Twitter, Wikipedia, etc. Let us explore the data mining operations of the pattern library and extract some data using it.
We will start by mining data using Google by entering a keyword that we want to search for and display the text along with the URL that is there in the search result.
- Google Mining
from pattern.web import Google
google = Google()
for results in google.search('Analytics India Magazine'):
print(results.url)
print(results.text)
- Twitter Mining
We can also use twitter for mining data which we require. Let us explore it through an example.
from pattern.web import Twitter
twitter = Twitter()
for results in twitter.search('Analytics India Magazine'):
print(results.url)
print(results.text)
- Flickr Mining
Flickr is an American image hosting and video hosting service, as well as an online community. Pattern can be used to extract data from Flickr.
from pattern.web import Flickr
flickr = Flickr(license=None)
for result in flickr.search('Analytics India Magazine'):
print(result.url)
print(result.text)
Similarly, Pattern provides a large number of online data mining using different platforms and we can use them accordingly.
Conclusion:
In this article, we started with installing Pattern, an open-source python library based on NLP and started exploring its different functions, we saw how the pattern is different from other NLP based python libraries after that we explored how we can use Pattern for text mining and extract data from online sources. Here we learned about how to use Pattern for NLP operations and Data mining from different platforms easily and effortlessly.