MITB Banner

Hands-On Tutorial On Polyglot – Python Toolkit For Multilingual NLP Applications

Polyglot is an open-source python library which is used to perform different NLP operations. It is based on NumPy which is why it is fast. It has a large variety of dedicated commands which makes it stand out of the crowd.

Share

Natural Language Processing is a process of making the human language understandable to machines and then performing different operations on it to extract useful information. NLP is a part of Artificial Intelligence which makes the interaction between computer and human language.

There is a large variety of python libraries that can help us in performing NLP tasks. All libraries have certain unique features and which make them different from each other. Generally, NLP libraries have functions like Tokenize, Stemming, Lamenting, Spell CHeck, etc. 

Polyglot is an open-source python library which is used to perform different NLP operations. It is based on NumPy which is why it is fast. It has a large variety of dedicated commands which makes it stand out of the crowd. It is similar to spacy and can be used for languages that do not support spacy.

In this article, we will explore different NLP operations and functions which can be performed using polyglot.

Implementation:

Like any other python library, we will install polyglot using pip install polyglot.

  1. Importing Required Libraries

We will import polyglot and explore its different functionalities. All functionalities will be imported as and when required.

  1. Performing Operation on Data

Before performing different operations on our data, let us first initialize some text which we will use for performing different functions on.

init = '''Analytics India Magazine chronicles technological progress in the space of  analytics, artificial intelligence, data science & big data by highlighting the innovations, players, and challenges shaping the future of India through promotion and discussion of ideas and thoughts by smart, ardent, action-oriented individuals who want to change the world.'''

  1. Language Detection

Polyglot can identify the language of the text passed to it using the language function. Let us see how to use it.

detect = Detector(init)

print(detect.language)

Language Detector
  1. Tokenize

In tokenize, we can print the wordlist which is the words that are there in the text used as well as the sentences which are there in the text. 

from polyglot.text import Text

text = Text(init)

text.words

Wordlist

text.sentences

Sentences Detection
  1. POS Tagging

Parts of speech tagging is used to identify the syntactic functionality of word occurrence.

from polyglot.mapping import Embedding

text.pos_tags

POS Tagging
  1. Named Entity Extraction

It extracts phrases from the plain text that are entities like location, person, and organizations.

text.entities

Named Entity Extration

Let us try this with some more texts. 

init1 = '''Hello my name is Himanshu Sharma and I am from India'''

text = Text(init1)

text.entities

NER
  1. Morphological analysis

It defines the regularities behind word formation in human language. Let us see how to use it.

from polyglot.text import Word

words = ["programming", "parallel", "inevitable", "beautiful"]

for w in words:

     w = Word(w, language="en")

     print(w, w.morphemes)

Morphological Analysis
  1. Sentiment Analysis

It is used to find out the polarity of the text.

text = Text("The new economic policies are quite good.")

for w in text.words:

    print(w, w.polarity)

Sentiment Extraction

These are some of the NLP operations which we can perform using polyglot.

Conclusion:

In this article we saw how polyglot can be used to detect the language we are using in a particular text, followed by the tokenization in words and sentences. We saw how we can use named entity recognition and sentiment analysis. Polyglot is easy to use and can be used for a variety of od NLP operations.

Share
Picture of Himanshu Sharma

Himanshu Sharma

An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.