Hands-on Guide to StanfordNLP – A Python Wrapper For Popular NLP Library CoreNLP

StanfordNLP is a python wrapper for CoreNLP, it provides all the functionalities of CoreNLP to the python users.
StanfordNLP

Natural Language Processing is a part of Artificial Intelligence which aims to the manipulation of the human/natural language. It is used for extracting meaningful insights from textual datasets. NLP is mainly used for Text Analysis, Text Mining, Sentiment Analysis, Speech Recognition, Machine Translation, etc. Python provides different modules/packages for working on NLP Operations.

CoreNLP is a one-stop solution for all NLP operations like stemming, lementing, tokenization, finding parts of speech, sentiment analysis, etc. It is written in Java programming language but is used for different languages. CoreNLP is a framework that makes it easy to apply different language processing tools to a particular text. 

StanfordNLP is a python wrapper for CoreNLP, it provides all the functionalities of CoreNLP to the python users. StanfordNLP group consists of faculty, postdocs, programmers, and students who work together on algorithms that allow computers to process and understand human languages. 

In this article, we will explore StanfordNLP and see what types of natural language processing functionalities it provides. 

Implementation:

We will start exploring StanfordNLP but before that, we need to install it using pip install stanfordnlp.

  1. Importing required libraries

We will be exploring stanfordnlp, so we need to import it. Also, we need to download the English models of StanfordNLP as we will be working with the English language.

import stanfordnlp as st

st.download(‘en’)      #Downloading the English Models

This command will download the English models for stanfordnlp.

  1. NLP Operations using StanfordNLP

For exploring different NLP operations we first need to create a default pipeline for the English language. Also, let us define a text/ sentence which we will be working on.

pipe = stanfordnlp.Pipeline()

text = pipe("This artcile will tell you How to use StanfordNLP. Let us start.")

  1. Printing Dependencies

Dependency function displays the word in the sentence along with the indices for the word in the Universal Dependencies and the dependency relation of the words.

text.sentences[0].print_dependencies()

  1. Tokenization

Tokenization is separating the text into smaller units which can be words, characters, or subwords.  Tokenization function also provides the Lemmatization of all the words in the sentence along with verb form, dependency relation, etc. 

text.sentences[1].print_tokens()

  1. Lemmatization

It is the process of grouping together the different inflected forms of a word so they can be analyzed as a single itemLemmatization is an effortless task when we use StanforNLP, we just need to split the sentence into words and apply lemma function to each word.

for i in text.sentences:

            for j in i.words:

           print(j.lemma)

  1. POS Tagging

Part of speech tagging assigns each word with the parts of speech such as nouns, verbs, adjectives, etc.

for i in text.sentences:

            for j in i.words:

         print(j.pos)

Similarly, we can also find treebank-specific POS (XPOS) tags, and universal morphological features (UFeats).

  1. Deparsing

It determines the syntactic head of each word in the sentence and the dependency relation between two words. It has two functions ‘governor’ and ‘dependency_relation’

for i in text.sentences:

           for j in i.words:

         print(j.governor, j.dependency_relation)

These are some of the basic NLP techniques which we can apply to our textual data. StanfordNLP is fast and effective. Other than the basic functions, the StanfordNLP module contains different packages for different uses.

  1. stanfordnlp.models

Models package contains different language packs, currently, StanfordNLP supports around 53 languages and we can download the models for different languages as we have seen in the starting of the article where we downloaded English language.

  1. stanford.pipeline

Pipeline package is used to process textual data by building desired pipelines using different languages, processors, models of our choice, also we can attempt to use gpu by ‘use_gpu’ if gpu is available.

  1. stanford.server

The server is defined as the internal server which contains a simple web API server for surviving the NLP operations. This server can be used to directly operate the functionalities provided by CoreNLP, for connecting to the CoreNLP client API java should be installed on your systems.

Conclusion:

In this article, we saw how we can use StanfordNLP for textual data processing with around 53 languages supported. Stanford provides ease of use and blazingly fast speed to perform different tasks related to NLP. We saw how different functionalities can help us perform operations on a large number of the dataset and that too in different languages. 

More Great AIM Stories

Himanshu Sharma
An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.

More Stories

OUR UPCOMING EVENTS

8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

MORE FROM AIM
Vijaysinh Lendave
NeuSpell: A Neural Net Based Spelling Correction Toolkit

Spell check features, or spell checkers, are software applications that check words against a digital dictionary to ensure they are correctly spelled. Words that are identified as misspelled by the spell checker are usually highlighted or underlined.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM