MITB Banner

16 Closed-Source LLMs You Must Know About

Every model before GPT-4 that researchers don’t trust the public with.
Share
Listen to this story

OpenAI’s co-founder, Ilya Sutskever, recently admitted that open-sourcing AI was a mistake, stating that in a few years it will be “completely obvious” that it’s not wise. He also commented on the competitive nature of developing GPT-4, saying that it took all of OpenAI’s resources and that many other companies are vying to achieve the same thing.

OpenAI is not the only one keeping its technology locked in. Here are 15 closed source LLMs which the researchers refuse to hand over to the public. 

GPT-3 

OpenAI’s GPT-3 caused quite a stir in May 2020 with 175B parameters, promising to outperform its predecessor, GPT-2. After initially releasing in limited beta, the model generated a lengthy waiting list of eager developers looking to tap into its advanced capabilities. Some people did wonder if GPT-3 was self-aware. In reality, unlike its predecessors, the model was trained to have a degree of common sense. 

GPT-4

OpenAI did it again with the latest GPT-4. Boasting significant improvements, the model has astounded many with its uncanny ability to generate human-like text and even whip up images and code from mere prompts. But the catch is, as of now, only ChatGPT’s esteemed subscribers have access to the technology. 

Megatron Turing NLG

In October 2021, NVIDIA and Microsoft announced the Megatron-Turing NLG with 530 billion parameters, three times more than its closest competitors. The model is powered by DeepSpeed and Megatron transformer models. 

ERNIE 3.0 Titan

Baidu and the Peng Cheng Laboratory developed ERNIE 3.0 Titan, a pre-training language model that boasts 260 billion parameters. With its vast knowledge graph and unstructured data training, the model has achieved state-of-the-art results in over 60 NLP tasks. Baidu claimed that this is the world’s first knowledge-enhanced multi-hundred billion parameter model and the largest Chinese singleton model. 

Jurassic-1

At release, AI21 Labs claimed that Jurassic-1 is “the largest and most sophisticated” LLM ever released for general use by developers. With 178 billion parameters, it is slightly bigger than GPT-3 and has the capacity to recognise 250,000 lexical items, five times more than other language models. The model was trained on Jumbo, consisting of 300 billion tokens collected from English-language websites. 

Wu Dao 2.0

China’s latest masterpiece Wu Dao 2.0 is a language model built by the Beijing Academy of Artificial Intelligence (BAAI) with a staggering 1.75 trillion parameters, surpassing the capacities of GPT-3 and Google’s Switch Transformer. Wu Dao 2.0 covers both English and Chinese, and its abilities range from simulating conversational speech to writing poetry and generating recipes. 

HyperCLOVA

Naver Corp’s HyperCLOVA, the South Korean-language AI model, was released in May 2021. The company all set to launch this July to launch an upgraded version called HyperCLOVA X, which can understand images and speech in a multimodal format. Trained on a massive corpus of 560B tokens, the Korean GPT-3, as it is called, can be a game-changer in the world of natural language processing, according to Kim Yu-won, CEO of Naver Cloud Corp.

Gopher

DeepMind’s Gopher is a 280 billion parameter transformer language model. The researchers claimed that the model almost halves the accuracy gap from GPT-3 to human expert performance, exceeding forecaster expectations and lifting performance over current state-of-the-art language models across roughly 81% of tasks.

Chinchilla 

Deepmind’s another addition to their animal-inspired lineup of models is Chinchilla – a 70B parameters model is designed to be compute-optimal. With 1.4 trillion tokens in its training data, Chinchilla was found to be optimally trained by equally scaling both model size and training tokens. Despite using the same compute budget as Gopher, Chinchilla boasts 4x more training data, making it a formidable contender in the language model space. 

Galactica 

The avalanche of scientific data has made it tough to find valuable insights. While search engines can help, they alone can’t organise scientific knowledge. Enter Galactica, Meta’s large language model that can store, combine, and reason about scientific data. Trained on a large corpus of scientific papers, it outperformed existing models in a range of scientific tasks. However, the model was removed three days after its launch.

LaMDA

The problem child of the LLM world caught quite a lot of attention after a Google former engineer Blake Lemoine claimed it to be sentient while testing the family of language models. Developed by Google with 137 billion parameters, LaMDA was created by fine-tuning a group of transformer-based neural language models and was pre-trained using a dataset of 1.5 trillion words, 40 times larger than previous models. 

Also read: The Wait for LaMDA is Almost Over

AlexaTM

Amazon’s AlexaTM is the company’s 20 billion parameter large language model. Its unique encoder and decoder setup improves its performance on machine translation, making it stand out from the competition. Despite having only 1/8 the parameters of its rival GPT-3, it outperformed it on both SQuADv2 and SuperGLUE benchmarks. 

BloombergGPT

Last month, Bloomberg unveiled BloombergGPT, a new large-scale generative AI model, specifically designed to tackle the complex landscape of the financial industry. This highly trained language model, optimised to parse and process vast quantities of financial data, seems promising in the NLP domain.

PanGu

Last month, researchers at Huawei announced PanGu-Σ Trained on Ascend 910 AI processors. MindSpore 5 served as the framework, as the model underwent rigorous training with a whopping 329 billion tokens over a hundred days.

Kosmos-1

Microsoft released, “Language Is Not All You Need: Aligning Perception with Language Models,” featuring the remarkable Kosmos-1 multimodal large language model (MLLM). The tech giant’s team argues that MLLMs represent a critical leap forward in unlocking unprecedented capabilities and opportunities in language comprehension, far surpassing the capabilities of traditional LLMs.

PS: The story was written using a keyboard.
Share
Picture of Tasmia Ansari

Tasmia Ansari

Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India