Last updated April 3, 2024
In Top AI Tools

Top 10 Alternatives to GPT-3

There are many other language models with either lesser or more number of parameters, that can be suitable for your NLP tasks

Share

Published on December 6, 2022

by Mohit Pandey

Listen to this story

ChatGPT has been trending for a while now, with several calling it the ‘Google killer’. The hero of the hour has been trained on GPT-3, one of the most well-known large language models (LLMs) developed by OpenAI. GPT-3 has 175 billion parameters, making it one of the largest language models ever created. It can generate human-like text and perform a wide range of tasks, including translation, summarisation, and even writing codes.

While OpenAI hit that sweet spot with GPT-3, DeepMind, Google, Meta, and other players too have developed their own language models, some with 10 times more parameters than GPT-3.

Here is a list of the top alternatives to GPT-3 that you can try to build your own natural language processing tasks like chatbots.

BLOOM

Developed by a group of over 1,000 AI researchers, Bloom is an open-source multilingual language model that is considered as the best alternative to GPT-3. It is trained on 176 billion parameters, which is a billion more than GPT-3 and required 384 graphics cards for training, each having a memory of more than 80 gigabytes.

Developed through the BigScience Workshop by HuggingFace, the language model has been trained on 46 languages and 13 programming languages and is available on different versions with lesser parameters as well.

GLaM

Developed by Google, GLaM is a mixture of experts (MoE) model, which means it consists of different submodels specialising in different inputs. It is one of the largest available models with 1.2 trillion parameters across 64 experts per MoE layer. During inference, the model only activates 97 billion parameters per token prediction.

Gopher

DeepMind developed Gopher with 280 billion parameters and is specialised in answering science and humanities questions much better than other languages. DeepMind claims that the model can beat language models 25 times its size, and compete with logical reasoning problems with GPT-3. There are smaller versions with 44 million parameters available as well for easier research.

Read: Google GLaM Vs DeepMind Gopher

Megatron-Turing NLG

NVIDIA and Microsoft collaborated to create one of the largest language models with 530 billion parameters. The model was trained on the NVIDIA DGX SuperPOD-based Selene supercomputer and is one of the most powerful English language models. Megatron-Turing Natural Language Generation (NLG) is a 105-layer, transformer-based LLM that outperforms state of the art models at zero-, one-, and few-shot settings with utmost accuracy.

Chinchilla

Another model developed by DeepMind, and touted as the GPT-3 killer, Chinchilla is a compute-optimal model and is built on 70 billion parameters but with four times more data. The model outperformed Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG on several downstream evaluation tasks. It requires very less computing power for finetuning and inference. The researchers found that instead of increasing the number of parameters, scaling the number of training tokens i.e. the text data, is the key for better performing language models.

PaLM

Another language model developed by Google, PaLM, trained on 540 billion parameters, is a dense decoder-only transformer model trained with the Pathways system. This language model was the first that used the Pathways system to train large-scale models with 6144 chips, along with being the largest TPU-based configuration. The model outperformed 28 out of 29 NLP tasks in English when compared to other models.

Read: David vs. Goliath: Does Chinchilla fare well against Google AI’s PaLM?

BERT

Google took a neural network-based technique for NLP pre-training and developed BERT (Bidirectional Encoder Representations from Transformers). The model has two versions–Bert Base uses 12 layers of transformers of transformers block and 110 million trainable parameters while Bert Large uses 24 layers and 340 million trainable parameters.

LaMDA

Developed by Google with 137 billion parameters, LaMDA was a revolution in the natural language processing world. It was built by fine-tuning a group of Transformer-based neural language models. For pre-training, the team created a dataset of 1.5 trillion words which is 40 times more than previously developed models. LaMDA has already been used for zero-shot learning, program synthesis, and BIG-bench workshop.

OPT

Built by Meta, Open Pretrained Transformer (OPT) is a language model with 175 billion parameters. It is trained on openly available datasets allowing more community engagement. The release comes with the pretrained models along with code for training. The model is currently under noncommercial licence and available for research use only. The model was trained and deployed using 16 NVIDIA V100 GPUs, which is significantly lower than other models.

AlexaTM

Amazon also unveiled its large language model with 20 billion parameters. Alexa Teacher Models (AlexaTM 20B) is a seq-2-seq language model with SOTA capabilities for few shot learning. What makes it different from others is that it has an encoder and decoder for increasing performance on machine translation. With 1/8 number of parameters, the language model by Amazon outperformed GPT-3 on SQuADv2 and SuperGLUE benchmarks.

Resources