Natural Language Processing is a process of making the human language understandable to machines and then performing different operations on it to extract useful information. NLP is a part of Artificial Intelligence which makes the interaction between computer and human language.
There is a large variety of python libraries that can help us in performing NLP tasks. All libraries have certain unique features and which make them different from each other. Generally, NLP libraries have functions like Tokenize, Stemming, Lamenting, Spell CHeck, etc.
Polyglot is an open-source python library which is used to perform different NLP operations. It is based on NumPy which is why it is fast. It has a large variety of dedicated commands which makes it stand out of the crowd. It is similar to spacy and can be used for languages that do not support spacy.
In this article, we will explore different NLP operations and functions which can be performed using polyglot.
Implementation:
Like any other python library, we will install polyglot using pip install polyglot.
- Importing Required Libraries
We will import polyglot and explore its different functionalities. All functionalities will be imported as and when required.
- Performing Operation on Data
Before performing different operations on our data, let us first initialize some text which we will use for performing different functions on.
init = '''Analytics India Magazine chronicles technological progress in the space of analytics, artificial intelligence, data science & big data by highlighting the innovations, players, and challenges shaping the future of India through promotion and discussion of ideas and thoughts by smart, ardent, action-oriented individuals who want to change the world.'''
- Language Detection
Polyglot can identify the language of the text passed to it using the language function. Let us see how to use it.
detect = Detector(init)
print(detect.language)
- Tokenize
In tokenize, we can print the wordlist which is the words that are there in the text used as well as the sentences which are there in the text.
from polyglot.text import Text
text = Text(init)
text.words
text.sentences
- POS Tagging
Parts of speech tagging is used to identify the syntactic functionality of word occurrence.
from polyglot.mapping import Embedding
text.pos_tags
- Named Entity Extraction
It extracts phrases from the plain text that are entities like location, person, and organizations.
text.entities
Let us try this with some more texts.
init1 = '''Hello my name is Himanshu Sharma and I am from India'''
text = Text(init1)
text.entities
- Morphological analysis
It defines the regularities behind word formation in human language. Let us see how to use it.
from polyglot.text import Word
words = ["programming", "parallel", "inevitable", "beautiful"]
for w in words:
w = Word(w, language="en")
print(w, w.morphemes)
- Sentiment Analysis
It is used to find out the polarity of the text.
text = Text("The new economic policies are quite good.")
for w in text.words:
print(w, w.polarity)
These are some of the NLP operations which we can perform using polyglot.
Conclusion:
In this article we saw how polyglot can be used to detect the language we are using in a particular text, followed by the tokenization in words and sentences. We saw how we can use named entity recognition and sentiment analysis. Polyglot is easy to use and can be used for a variety of od NLP operations.