Hands-on Guide to Pattern – A Python Tool for Effective Text Processing and Data Mining

Pattern is an open-source python library and performs different NLP tasks. It is mostly used for text processing due to various functionalities it provides.
Pattern Text Processing

Text Processing mainly requires Natural Language Processing( NLP), which is processing the data in a useful way so that the machine can understand the Human Language with the help of an application or product. Using NLP we can derive some information from the textual data such as sentiment, polarity, etc. which are useful in creating text processing based applications.

Python provides different open-source libraries or modules which are built on top of NLTK and helps in text processing using NLP functions. Different libraries have different functionalities that are used on data to gain meaningful results. One such Library is Pattern.       

Pattern is an open-source python library and performs different NLP tasks. It is mostly used for text processing due to various functionalities it provides. Other than text processing Pattern is used for Data Mining i.e we can extract data from various sources such as Twitter, Google, etc. using the data mining functions provided by Pattern. 

In this article, we will try and cover the following points:

  • NLP Functionalities of Pattern
  • Data Mining Using Pattern 

Implementation:

We will start by installing Pattern using the pip install pattern.

  1. Importing required library

Different functionalities are defined under different functions we will import them as and when required as we move ahead in this article. We will be working in the English language so we will be using ‘en’ for the English module. Let us start with some basic functionalities of Pattern for NLP operations 

  1. NLP Operations using Pattern

We will go through some of the most used and most important functionalities which are provided by Pattern. Starting with parsing a sentence.

a. Parsing

from pattern.en import parse

parse('Hello Everyone and Welcome to Analytics India Magazine')

Here we can see the output of the parse function differentiate the words in the sentence as a noun, verb, subject, or subject. We can also use the ‘pprint’ function defined in the pattern library to display the parsed sentence in a clear manner. Also, we can set different parameters for parses such as lemmata, tokenize, encoding, etc.  All these parameters can be used in parsing only so that we do not have to use a separate function for different properties.

from pattern.en import pprint

pprint(parse('Hello Everyone and Welcome to Analytics India Magazine', relations  = True,tokenize= True, lemmata= True))

b. N-Grams

N-Gram function is used to find all the n-grams in a given text string.

from pattern.en import ngrams

print(ngrams("Hello Everyone and Welcome to Analytics India Magazine", n=3))

  1. Sentiment Analysis

Sentiment function tries to identify the opinion or view that is held by the particular text string. Sentiment function returns both polarity and the subjectivity of the given text. The Polarity value ranges between 1(Highly Positive) to -1(Highly Negative) and subjectivity value ranges between 0(Objective) to 1(Subjective).

from pattern.en import sentiment

print(sentiment("He is a good boy but sometimes he behaves miserably"))

Sentiment Analysis

We can see that the sentiment analysis says that the sentence is negative with high subjectivity.

  1. Modality 

Modality is one such function that makes it different from other python libraries based on NLP. The modality function is used to find the degree of certainty in a particular sentence. Its value ranges from -1 to 1. As defined in the Pattern library we can state that a sentence with a modality of 0.5 and above can be stated as a fact.

from pattern.en import modality

text = parse('He is a good boy but sometimes he behaves miserably')

text= Sentence(text)

print(modality(text))

The modality comes out to be zero which means that the sentence is neutral.

  1. Suggest

Suggest function is used for spelling corrections but it is more than that. It not only checks the spelling it also gives you suggestions of what might be the correct word with their probabilities. This function also distinguishes pattern from other libraries. 

from pattern.en import suggest

print(suggest("Heroi"))

Suggext Function
  1. Quantify

Quantify function is used to provide a word count estimation of the words given.

from pattern.en import quantify

a = quantify(['Pencil', 'Pencil', 'Eraser', 'Sharpener', 'Sharpener', 'Sharpener', 'Scale', 'Compass'])

print(a)

  1. Data Mining using Pattern

One of the most important features of Pattern is that it can be used for data mining through different platforms like Google, Twitter, Wikipedia, etc. Let us explore the data mining operations of the pattern library and extract some data using it.

We will start by mining data using Google by entering a keyword that we want to search for and display the text along with the URL that is there in the search result.

  1. Google Mining

from pattern.web import Google

google = Google()

for results in google.search('Analytics India Magazine'):

    print(results.url)

    print(results.text)

  1. Twitter Mining

We can also use twitter for mining data which we require. Let us explore it through an example.

from pattern.web import Twitter

twitter = Twitter()

for results in twitter.search('Analytics India Magazine'):

    print(results.url)

    print(results.text)

Twitter Data Mining
  1. Flickr Mining 

Flickr is an American image hosting and video hosting service, as well as an online community. Pattern can be used to extract data from Flickr.

from pattern.web import Flickr

flickr = Flickr(license=None)

for result in flickr.search('Analytics India Magazine'):

    print(result.url)

    print(result.text)

Flick Data Mining

Similarly, Pattern provides a large number of online data mining using different platforms and we can use them accordingly.

Conclusion:

In this article, we started with installing Pattern, an open-source python library based on NLP and started exploring its different functions, we saw how the pattern is different from other NLP based python libraries after that we explored how we can use Pattern for text mining and extract data from online sources. Here we learned about how to use Pattern for NLP  operations and Data mining from different platforms easily and effortlessly. 

Download our Mobile App

Himanshu Sharma
An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR