The year 2019 was an excellent year for the developers, as almost all industry leaders open-sourced their machine learning tool kits. Open-sourcing not only help the users but also helps the tool itself as developers can contribute and add customisations that serve few complex applications. The benefit is mutual and also helps in accelerating the democratisation of ML. Here we have compiled few open-source NLP projects that would be exciting both for the developers as well as the users:
LIGHT (Learning in Interactive Games with Humans and Text) — a large-scale fantasy text adventure game and research platform for training agents that can both talk and act, interacting either with other models or humans. The game uses natural language that’s entirely written by the people who are playing the game. This platform enables researchers to study language and actions jointly in the game world. The complete setup was made open-source and was available to other researchers.
Built on Google infrastructure, Dialogflow is a Google service that runs on the Google Cloud Platform. Powered by Google’s machine learning, Dialogflow incorporates machine learning expertise that lets clients scale to hundreds of millions of users. Dialogflow has been optimised for the Google Assistant and is the most widely used tool to build Actions for more than 400M+ Google Assistant devices.
Icecaps provides an array of capabilities thanks to Microsoft’s work on personalisation embeddings, maximum mutual information-based decoding, knowledge grounding, and on shared feature representations that enable conversational AI that is more diverse and gives relevant responses. Most importantly, Microsoft’s library leverages TensorFlow, which makes it easy for users to construct sophisticated training configurations using multi-task learning.
AllenNLP is an open-source NLP research library, built on PyTorch. AllenNP made designing and evaluating new deep learning models easy for any NLP problem. It can also be run efficiently on the cloud or the laptop. AllenNLP is built and maintained by the Allen Institute for AI, in close collaboration with researchers at the University of Washington and its users that includes include Facebook research, Airbnb, Amazon Alexa and other top players of the industry.
Rasa Open Source
Rasa is an open-source framework to build high-performing, resilient, proprietary contextual assistants. It provides the necessary infrastructure to create great assistants that can understand messages and create meaningful conversations; employ machine learning to improve those conversations; and integrate it seamlessly with existing systems and channels.
OpenNLP supports the most common NLP tasks, such as tokenisation, sentence segmentation, part-of-speech tagging, entity extraction, chunking, parsing, language detection, and coreference resolution. The Apache OpenNLP website says that they are always looking for new contributors to work on all parts of the project to make it better.
BERT, or Bidirectional Encoder Representations from Transformers, released by Google in the latter half of 2018, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.
BERT is the most spoken NLP project for most of 2019. This was even propelled when Google announced that it uses BERT for its search engine. Getting acquainted with BERT is beneficial as it has also led to many variants and keeps growing in spite of its rivals like XLNet and ERNIE.
This feature automatically determines the best hyperparameters for your data set to build an efficient text classifier. To use autotuning, a researcher inputs the training data as well as a validation set and a time constraint. FastText then uses the allotted time to search for the hyperparameters that give the best performance on the validation set. Optionally, the researcher can also constrain the size of the final model. In such cases, fastText uses compression techniques to reduce the size of the model.
Building an efficient text classifier in one command line, and therefore, researchers can now create a memory-efficient classifier for various tasks, including sentiment analysis, language identification, spam detection, tag prediction, and topic classification.
These are few of the widely popular NLP projects out there today along with projects like Spacy, Gensim and Hugging Face.
For more resources on NLP, check this.