Listen to this story
It seems like everyone is obsessed with the latest craze: large language models (LLMs). The appetite for these data-devouring behemoths just keeps growing. From GPT-3 to Megatron, the quest for bigger and better resources is far from over. So whether you’re a language processing newbie or a seasoned pro, here’s a rundown of all the open source LLMs that have hit the scene so far. Get ready to geek out!
Within weeks of releasing Dolly, Databricks has unveiled Dolly 2.0, a model for commercial use without requiring payment for API access or data sharing with third parties. The model is a potential solution to the legal ambiguity surrounding large language models that were previously trained on ChatGPT output.
The world’s largest open-source large language model, presented by the Hugging Face team. With the collaborative efforts of a thousand brilliant minds from across the globe, BigScience birthed BLOOM.
The model impressively surpasses GPT-3 and the largest Chinese language model on various benchmarks, this model is a true game-changer. But that’s not all – it also boasts a unique scaling property that allows for efficient inference on affordable GPUs. The best part? The model weights, code, and training logs are all available to the public. Say goodbye to language processing limitations and hello to GLM-130B!
In the NLP realm, the GPT-Neo, GPT-J, and GPT-NeoX models shine, providing a powerful tool for few-shot learning.
Thanks to the minds at EleutherAI, these models have been crafted and made available to the public as open-source versions of GPT-3, which has been kept under lock and key by OpenAI. GPT-J and GPT-Neo, were trained on the mighty Pile dataset, a collection of linguistic data sources that spans across different domains, making them versatile and adaptable to various natural language processing tasks.
But the crown jewel of this trio is GPT-NeoX, a model built on the foundation of Megatron-LM and Meta’s DeepSeed, and designed to shine on the stage of GPUs. Its massive 20 billion parameters make it the largest publicly available model. GPT-NeoX is the proof-of-concept that pushes the boundaries of few-shot learning even further.
After initially withholding GPT-2 for nine months, due to concerns over its potential for spreading disinformation, spam, and fake news, OpenAI released smaller, less complex versions for testing purposes. In the November blog, OpenAI reported that it has witnessed “no strong evidence of misuse,” and as a result, made the full GPT-2 model available for use.
Google AI disagreed with this ‘bigger the better’ assumption in the LLMs race where the size of the models have been the attention grabbing factor. The study found that bigger language models work better because they can learn from previous tasks more effectively. Based on this, Google created PaLM or Pathways Language Model, which has 540 billion parameters and is a decoder-only Transformer model.
Meta made a big splash in May 2022 with the release of its OPT (Open Pre-trained Transformer) models. Ranging from 125 million to a whopping 175 billion parameters, these transformers can handle language tasks on an unprecedented scale.
You can download the smaller variants from Github, but the biggest one is only accessible upon request.
Cerebras, an AI infrastructure firm based, made a bold move with the release of seven open-source GPT models. These models, including weights and training recipes, are available to the public free of charge under the Apache 2.0 license, challenging the proprietary systems of the current closed door industry.
Google AI launched an open-source language model – Flan-T5 that can tackle more than 1,800 diverse tasks. Researchers claimed that the Flan-T5 model’s advanced prompting and multi-step reasoning capabilities could lead to significant improvements.
Meta announced LLaMA at the end of February 2023. Unlike its counterparts, OpenAI’s ChatGPT and Microsoft’s Bing, LLaMA is not accessible to the public, but instead, Meta made it available as an open-source package that the AI community could request access to.
But, just one week after Meta began accepting requests to access LLaMA, the model was leaked online, sending shockwaves through the tech community.
Read here: 7 Ways Developers are Harnessing Meta’s LLaMA
From the halls of Stanford University emerged Alpaca. The model was created by fine-tuning LLaMA 7B with over 50k demonstrations following instructions from GPT 3.5. It was trained and tested at a mere $600, instead of the millions.
Since its release, Alpaca has been hailed as a breakthrough, Though it started small, with a Homer Simpson bot, the model quickly proved its versatility.