Microsoft co-founder Paul Allen founded The Allen Institute for Artificial Intelligence in 2014 to achieve scientific breakthroughs by building AI systems with reasoning, learning, and reading capabilities. Over the years, the private research institute and startup incubator has pushed the frontiers of AI and machine learning.
We have listed their major innovations here.
Sign up for your weekly dose of what's up in emerging technology.
Built on PyTorch, AllenNLP is an open source model. The deep learning library supports the management of experiments and the evaluation after development. It provides high-level abstractions and APIs for NLP models, along with an extensible framework to run and manage these experiments. It is used by a large number of organisations such as Facebook Research, Airbnb, and Amazon Alexa.
AllenNLP offers the following features:
- A command-line tool for training PyTorch models
- Collection of pre-trained models to make predictions
- Experiment framework to do replicable science
- Readable reference implementations of common NLP models
Aristo is a pet project of the Allen Institute for Artificial Intelligence. It aims to build systems that integrate technologies for reading, learning, explanation, and reasoning to demonstrate a deep understanding of the world. In 2019, Aristo software showed it could score better than 90 percent of the eighth-graders on a multiple-choice test and perform better than 80 percent on a test for high school seniors. Considering the abysmal performance of AI programs in the $80,000 Allen AI Science Challenge in 2016, Aristo’s achievement in just three years down the line is indeed a tour de force.
Aristo project combines natural language processing, information extraction, knowledge representation, machine reasoning, and commonsense knowledge. So far, this project has been deployed in research areas like multihop reasoning, reasoning about actions, probing reasoning with language models, etc.
The Allen Institute for Artificial Intelligence has developed Multimodal Neural Script Knowledge Models (Merlot) in collaboration with the University of Washington. The system is trained on millions of YouTube videos with transcribed speech to help it learn to match images in videos with words and follow events globally over time. This is an unsupervised model, and the videos aren’t labelled or categorised.
Allen Institute for Artificial Intelligence, the Hebrew University of Jerusalem, and the University of Washington created GENIE, a leaderboard for human-in-the-loop evaluation of text generation. The adoption of leaderboards has so far been limited to setups with automatic evaluations. Open-ended tasks that require natural language generation, like language translation, lack techniques that can reliably and automatically evaluate the model’s quality.
GENIE remediate these problems and post model predictions to a crowdsourcing platform where human annotators evaluate them as per predetermined parameters. Further, GENIE also incorporates popular metrics such as BLEU and ROGUE to show how well they correlate with human assessment scores.
S2ORC and TLDR
Semantic Scholar Open Research Corpus is a collection of 8.1 million English-language academic papers. As per the team behind this large corpus, this resource is one of the largest publicly available collections of machine-readable academic text that consists of rich metadata, paper abstracts, bibliographic references, and full text for open access papers. In addition, the full-text is annotated with automatically-detected inline mentions of figures, citations, tables, etc., each linked to their corresponding paper objects.
In 2020, a free tool TLDR (common internet acronym for ‘Too long, didn’t read’) was activated with S2ORC for the search results. This tool can summarise documents containing over 5,000 words in just 21 words on average, rendering a compression ratio of 238.
AllenAct is a platform to promote reproducible research in embodied AI with a focus on modularity and flexibility. It supports multiple training environments and algorithms with pretrained models, real-time visualisations, and other tutorials. It addresses challenges related to embodied AI such as data replication, ramp-up time, and training costs by decoupling tasks and environments. It also ensures compatibility with specialised algorithms involving sequences of training routines. AllenAct’s visualisation can be integrated with TensorBoard.