Top NLP Open Source Projects For Developers In 2020

The year 2019 was an excellent year for the developers, as almost all industry leaders open-sourced their machine learning tool kits. Open-sourcing not only help the users but also helps the tool itself as developers can contribute and add customisations that serve few complex applications. The benefit is mutual and also helps in accelerating the democratisation of ML. Here we have compiled few open-source NLP projects that would be exciting both for the developers as well as the users:

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

LIGHT

LIGHT (Learning in Interactive Games with Humans and Text) — a large-scale fantasy text adventure game and research platform for training agents that can both talk and act, interacting either with other models or humans. The game uses natural language that’s entirely written by the people who are playing the game. This platform enables researchers to study language and actions jointly in the game world. The complete setup was made open-source and was available to other researchers.

Dialogflow

Built on Google infrastructure, Dialogflow is a Google service that runs on the Google Cloud Platform. Powered by Google’s machine learning, Dialogflow incorporates machine learning expertise that lets clients scale to hundreds of millions of users. Dialogflow has been optimised for the Google Assistant and is the most widely used tool to build Actions for more than 400M+ Google Assistant devices.

Microsoft Icecaps

Icecaps provides an array of capabilities thanks to Microsoft’s work on personalisation embeddings, maximum mutual information-based decoding, knowledge grounding, and on shared feature representations that enable conversational AI that is more diverse and gives relevant responses. Most importantly, Microsoft’s library leverages TensorFlow, which makes it easy for users to construct sophisticated training configurations using multi-task learning. 

AllenNLP

AllenNLP is an open-source NLP research library, built on PyTorch. AllenNP made designing and evaluating new deep learning models easy for any NLP problem. It can also be run efficiently on the cloud or the laptop. AllenNLP is built and maintained by the Allen Institute for AI, in close collaboration with researchers at the University of Washington and its users that includes include Facebook research, Airbnb, Amazon Alexa and other top players of the industry.

Rasa Open Source

Rasa is an open-source framework to build high-performing, resilient, proprietary contextual assistants. It provides the necessary infrastructure to create great assistants that can understand messages and create meaningful conversations; employ machine learning to improve those conversations; and integrate it seamlessly with existing systems and channels.

Apache OpenNLP

OpenNLP supports the most common NLP tasks, such as tokenisation, sentence segmentation, part-of-speech tagging, entity extraction, chunking, parsing, language detection, and coreference resolution. The Apache OpenNLP website says that they are always looking for new contributors to work on all parts of the project to make it better. 

BERT

BERT, or Bidirectional Encoder Representations from Transformers, released by Google in the latter half of 2018, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.

BERT is the most spoken NLP project for most of 2019. This was even propelled when Google announced that it uses BERT for its search engine. Getting acquainted with BERT is beneficial as it has also led to many variants and keeps growing in spite of its rivals like XLNet and ERNIE.

Hyperparameter Autotuning

This feature automatically determines the best hyperparameters for your data set to build an efficient text classifier. To use autotuning, a researcher inputs the training data as well as a validation set and a time constraint. FastText then uses the allotted time to search for the hyperparameters that give the best performance on the validation set. Optionally, the researcher can also constrain the size of the final model. In such cases, fastText uses compression techniques to reduce the size of the model.

Building an efficient text classifier in one command line, and therefore, researchers can now create a memory-efficient classifier for various tasks, including sentiment analysis, language identification, spam detection, tag prediction, and topic classification.

These are few of the widely popular NLP projects out there today along with projects like Spacy, Gensim and Hugging Face.

For more resources on NLP, check this.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.

Now Reliance wants to conquer the AI space

Many believe that Reliance is aggressively scouting for AI and NLP companies in the digital space in a bid to create an Indian equivalent of FAANG – Facebook, Apple, Amazon, Netflix, and Google.

[class^="wpforms-"]
[class^="wpforms-"]