AI21 Labs, an Israeli AI company, specialising in NLP, has released a language model, Jurassic-1 Jumbo. The tool is released with the idea to challenge OpenAI’s dominance in the “natural language processing-as-a-service” field.
Jurassic-1 is offered via AI21 Studio, the company’s new NLP-as-a-Service developer platform, a website and API where developers can build text-based applications like virtual assistants, chatbots, text simplification, content moderation, creative writing, and many new products and services.
Sign up for your weekly dose of what's up in emerging technology.
AI21 Studio has made this tool available to anyone interested in prototyping custom text-based AI applications and developers to customise a private version of Jurassic-1 models easily.
ML researchers and developers posit that larger models trained on more parameters produce better outcomes. In this article, we compare Jurassic-1 to other large language models that are currently leading the market.
With its 178 billion parameters, Jurassic -1 is slightly bigger (3 billion more) than GPT-3. AI21 claims this to be ‘the largest and most sophisticated language model ever released for general use by developers.’
The researchers also claim that Jurassic-1 can recognise 250,000 lexical items, which is 5x more than the capacity of all other language models. Moreover, since these items include multi-words like expressions, phrases, and named entities, the tool has a richer semantic representation of human concepts and a reduced latency rate.
The training dataset for Jurassic-1 Jumbo contained 300 billion tokens from English-language websites including Wikipedia, news publications, StackExchange, and OpenSubtitles. This makes it more convenient for potential users to train a custom model for exclusive use with only 50-100 training examples.
AI21 Labs says that the Jurassic-1 models performed at par or better than GPT-3 in a test on a benchmark suite. This performance is across a range of tasks, including answering academic and legal questions. Jurassic -1 was able to cover traditional language model vocabulary with words like ‘potato’ and understand complex phrases or uncommon words like ‘New York Yankees’ or ‘Xi Jinping.’
For the better part of a year, OpenAI’s GPT-3 has remained among the most significant AI language models ever created, if not the largest of its kind. Released in May 2020 by OpenAI, GPT-3 (Generative Pre-trained Transformer) is a language model capable of generating unique human-like text on demand. The AI research company is backed by Peter Thiel and Elon Musk and is the model’s third generation, as the moniker ‘3’ suggests. GPT-3 was built on 570 GB worth of data crawled from the internet, including all of Wikipedia.
It is by far the largest known neural net created and has the essential capability to generate text given limited context, and this ‘text’ can be anything with a language structure – spanning essays, tweets, memos, translations and even computer code. It is unique in its scale; its earlier version GPT-2 had 1.5 billion parameters and the largest language model that Microsoft built preceding it, 17 billion parameters; both dwarfed by the 175 billion parameters capacity of GPT-3.
In 2020, Microsoft’s Turing NLG held the distinction of being the largest model ever published. A Transformer-based generative language model, Turing NLG, was created with 17 billion parameters.
T-NLG can generate words to complete open-ended textual tasks and unfinished sentences. Microsoft claims that the model can generate direct answers to questions and summarise documents. The team behind T-NLG believes that the bigger the model, the better it performs with fewer training examples. It is also more efficient to train a large centralised multi-task model than a new one for every task individually.
Wu Dao 2.0
The latest offering from China government-backed Beijing Academy of Artificial Intelligence (BAAI), Wu Dao 2.0, claimed to be the latest and the most extensive language model to date with 1.75 trillion parameters. It has surpassed models such as GPT-3, Google’s Switch Transformer in size. However, unlike GPT-3, Wu Dao 2.0 covers both Chinese and English with skills acquired by studying 4.9 terabytes of texts and images, including 1.2 terabytes of Chinese and English texts.
It can perform tasks like simulating conversational speech, writing poetry, understanding pictures, and even generating recipes. It can also predict the 3D structures of proteins like DeepMind’s AlphaFold. China’s first virtual student Hua Zhibing was built on Wu Dao 2.0.
Chinese company Huawei has developed PanGu Alpha, a 750-gigabyte model that contains up to 200 billion parameters. Being touted as the Chinese equivalent of GPT-3, it is trained on 1.1 terabytes of Chinese language ebooks, encyclopedias, news, social media posts, and websites.
The team has claimed the model to achieve “superior” performance in Chinese-language tasks spanning text summarisation, question answering, and dialogue generation. However, while experts believe that the essential feature of PanGu Alpha is its availability in the Chinese language, it seems that in terms of model architecture, this project doesn’t offer anything new.
With language models increasing in size and the assertion that bigger models are taking us a step closer to artificial general intelligence, questions regarding the risks of large language models arise.
Former Google AI researcher Timnit Gebru released her paper “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”, arguing that while these models create good results, they carry risks such as substantial carbon footprints.
Here’s a table outlining the major differences between Jurassic-1, GPT 3 and other language models in the race: