Listen to this story
OpenAI’s co-founder, Ilya Sutskever, recently admitted that open-sourcing AI was a mistake, stating that in a few years it will be “completely obvious” that it’s not wise. He also commented on the competitive nature of developing GPT-4, saying that it took all of OpenAI’s resources and that many other companies are vying to achieve the same thing.
OpenAI is not the only one keeping its technology locked in. Here are 15 closed source LLMs which the researchers refuse to hand over to the public.
OpenAI’s GPT-3 caused quite a stir in May 2020 with 175B parameters, promising to outperform its predecessor, GPT-2. After initially releasing in limited beta, the model generated a lengthy waiting list of eager developers looking to tap into its advanced capabilities. Some people did wonder if GPT-3 was self-aware. In reality, unlike its predecessors, the model was trained to have a degree of common sense.
OpenAI did it again with the latest GPT-4. Boasting significant improvements, the model has astounded many with its uncanny ability to generate human-like text and even whip up images and code from mere prompts. But the catch is, as of now, only ChatGPT’s esteemed subscribers have access to the technology.
Megatron Turing NLG
In October 2021, NVIDIA and Microsoft announced the Megatron-Turing NLG with 530 billion parameters, three times more than its closest competitors. The model is powered by DeepSpeed and Megatron transformer models.
Baidu and the Peng Cheng Laboratory developed ERNIE 3.0 Titan, a pre-training language model that boasts 260 billion parameters. With its vast knowledge graph and unstructured data training, the model has achieved state-of-the-art results in over 60 NLP tasks. Baidu claimed that this is the world’s first knowledge-enhanced multi-hundred billion parameter model and the largest Chinese singleton model.
At release, AI21 Labs claimed that Jurassic-1 is “the largest and most sophisticated” LLM ever released for general use by developers. With 178 billion parameters, it is slightly bigger than GPT-3 and has the capacity to recognise 250,000 lexical items, five times more than other language models. The model was trained on Jumbo, consisting of 300 billion tokens collected from English-language websites.
China’s latest masterpiece Wu Dao 2.0 is a language model built by the Beijing Academy of Artificial Intelligence (BAAI) with a staggering 1.75 trillion parameters, surpassing the capacities of GPT-3 and Google’s Switch Transformer. Wu Dao 2.0 covers both English and Chinese, and its abilities range from simulating conversational speech to writing poetry and generating recipes.
Naver Corp’s HyperCLOVA, the South Korean-language AI model, was released in May 2021. The company all set to launch this July to launch an upgraded version called HyperCLOVA X, which can understand images and speech in a multimodal format. Trained on a massive corpus of 560B tokens, the Korean GPT-3, as it is called, can be a game-changer in the world of natural language processing, according to Kim Yu-won, CEO of Naver Cloud Corp.
DeepMind’s Gopher is a 280 billion parameter transformer language model. The researchers claimed that the model almost halves the accuracy gap from GPT-3 to human expert performance, exceeding forecaster expectations and lifting performance over current state-of-the-art language models across roughly 81% of tasks.
Deepmind’s another addition to their animal-inspired lineup of models is Chinchilla – a 70B parameters model is designed to be compute-optimal. With 1.4 trillion tokens in its training data, Chinchilla was found to be optimally trained by equally scaling both model size and training tokens. Despite using the same compute budget as Gopher, Chinchilla boasts 4x more training data, making it a formidable contender in the language model space.
The avalanche of scientific data has made it tough to find valuable insights. While search engines can help, they alone can’t organise scientific knowledge. Enter Galactica, Meta’s large language model that can store, combine, and reason about scientific data. Trained on a large corpus of scientific papers, it outperformed existing models in a range of scientific tasks. However, the model was removed three days after its launch.
The problem child of the LLM world caught quite a lot of attention after a Google former engineer Blake Lemoine claimed it to be sentient while testing the family of language models. Developed by Google with 137 billion parameters, LaMDA was created by fine-tuning a group of transformer-based neural language models and was pre-trained using a dataset of 1.5 trillion words, 40 times larger than previous models.
Also read: The Wait for LaMDA is Almost Over
Amazon’s AlexaTM is the company’s 20 billion parameter large language model. Its unique encoder and decoder setup improves its performance on machine translation, making it stand out from the competition. Despite having only 1/8 the parameters of its rival GPT-3, it outperformed it on both SQuADv2 and SuperGLUE benchmarks.
Last month, Bloomberg unveiled BloombergGPT, a new large-scale generative AI model, specifically designed to tackle the complex landscape of the financial industry. This highly trained language model, optimised to parse and process vast quantities of financial data, seems promising in the NLP domain.
Last month, researchers at Huawei announced PanGu-Σ Trained on Ascend 910 AI processors. MindSpore 5 served as the framework, as the model underwent rigorous training with a whopping 329 billion tokens over a hundred days.
Microsoft released, “Language Is Not All You Need: Aligning Perception with Language Models,” featuring the remarkable Kosmos-1 multimodal large language model (MLLM). The tech giant’s team argues that MLLMs represent a critical leap forward in unlocking unprecedented capabilities and opportunities in language comprehension, far surpassing the capabilities of traditional LLMs.