Now Reading
Hands-on Guide to StanfordNLP – A Python Wrapper For Popular NLP Library CoreNLP

Hands-on Guide to StanfordNLP – A Python Wrapper For Popular NLP Library CoreNLP

Himanshu Sharma
StanfordNLP

Download our Mobile App


Natural Language Processing is a part of Artificial Intelligence which aims to the manipulation of the human/natural language. It is used for extracting meaningful insights from textual datasets. NLP is mainly used for Text Analysis, Text Mining, Sentiment Analysis, Speech Recognition, Machine Translation, etc. Python provides different modules/packages for working on NLP Operations.

CoreNLP is a one-stop solution for all NLP operations like stemming, lementing, tokenization, finding parts of speech, sentiment analysis, etc. It is written in Java programming language but is used for different languages. CoreNLP is a framework that makes it easy to apply different language processing tools to a particular text. 



StanfordNLP is a python wrapper for CoreNLP, it provides all the functionalities of CoreNLP to the python users. StanfordNLP group consists of faculty, postdocs, programmers, and students who work together on algorithms that allow computers to process and understand human languages. 

In this article, we will explore StanfordNLP and see what types of natural language processing functionalities it provides. 

Implementation:

We will start exploring StanfordNLP but before that, we need to install it using pip install stanfordnlp.

  1. Importing required libraries

We will be exploring stanfordnlp, so we need to import it. Also, we need to download the English models of StanfordNLP as we will be working with the English language.

import stanfordnlp as st


Stay Connected

Get the latest updates and relevant offers by sharing your email.

st.download(‘en’)      #Downloading the English Models

This command will download the English models for stanfordnlp.

  1. NLP Operations using StanfordNLP

For exploring different NLP operations we first need to create a default pipeline for the English language. Also, let us define a text/ sentence which we will be working on.

pipe = stanfordnlp.Pipeline()

text = pipe("This artcile will tell you How to use StanfordNLP. Let us start.")

  1. Printing Dependencies

Dependency function displays the word in the sentence along with the indices for the word in the Universal Dependencies and the dependency relation of the words.

text.sentences[0].print_dependencies()

  1. Tokenization

Tokenization is separating the text into smaller units which can be words, characters, or subwords.  Tokenization function also provides the Lemmatization of all the words in the sentence along with verb form, dependency relation, etc. 

text.sentences[1].print_tokens()

  1. Lemmatization

It is the process of grouping together the different inflected forms of a word so they can be analyzed as a single itemLemmatization is an effortless task when we use StanforNLP, we just need to split the sentence into words and apply lemma function to each word.

for i in text.sentences:

            for j in i.words:

           print(j.lemma)

  1. POS Tagging

Part of speech tagging assigns each word with the parts of speech such as nouns, verbs, adjectives, etc.

for i in text.sentences:

            for j in i.words:

See Also

         print(j.pos)

Similarly, we can also find treebank-specific POS (XPOS) tags, and universal morphological features (UFeats).

  1. Deparsing

It determines the syntactic head of each word in the sentence and the dependency relation between two words. It has two functions ‘governor’ and ‘dependency_relation’

for i in text.sentences:

           for j in i.words:

         print(j.governor, j.dependency_relation)

These are some of the basic NLP techniques which we can apply to our textual data. StanfordNLP is fast and effective. Other than the basic functions, the StanfordNLP module contains different packages for different uses.

  1. stanfordnlp.models

Models package contains different language packs, currently, StanfordNLP supports around 53 languages and we can download the models for different languages as we have seen in the starting of the article where we downloaded English language.

  1. stanford.pipeline

Pipeline package is used to process textual data by building desired pipelines using different languages, processors, models of our choice, also we can attempt to use gpu by ‘use_gpu’ if gpu is available.

  1. stanford.server

The server is defined as the internal server which contains a simple web API server for surviving the NLP operations. This server can be used to directly operate the functionalities provided by CoreNLP, for connecting to the CoreNLP client API java should be installed on your systems.

Conclusion:

In this article, we saw how we can use StanfordNLP for textual data processing with around 53 languages supported. Stanford provides ease of use and blazingly fast speed to perform different tasks related to NLP. We saw how different functionalities can help us perform operations on a large number of the dataset and that too in different languages. 

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
What's Your Reaction?
Excited
2
Happy
0
In Love
0
Not Sure
0
Silly
0

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top