Last updated February 15, 2022
In AI Origins & Evolution

EleutherAI launches GPT-NeoX-20B, the biggest public-access language model

EleutherAI hopes that increased access to language models of this size will act as a catalyst for the development of safe use of AI systems.

Published on February 14, 2022
by Kartik Wali

The world of AI is buzzing as EleutherAI launches its latest large language model (LLM) GPT-NeoX-20B, consisting of 20 billion parameters. Built on a Coreweave GPU, the language model comes pre-trained with the GPT-Neox framework. In competition with heavy hitters like Microsoft-NVIDIA’s Megatron-Turing Natural Language Generation model (MT-NLG) that is trained with 530 billion parameters, OpenAI’s GPT-3 with 175 billion parameters and Google’s switch transformer technique to train over a trillion parameters, EleutherAI boasts of their GPT-NeoX-20B to be the largest language model available for public access and is capable of performing an array of tasks.

With the release of their 20 billion parameter model, EleutherAI aims to make models of such sizes accessible to everyone and aid them in their research towards the safe use of AI systems, encouraging anyone in this line of work to reach out to them. Now let us go into the details of GPT-NeoX-20B and its works.

GPT-NeoX-20B, the new kid in the block

The GPT-NeoX-20B is an autoregressive transformer decoder model that is designed along the lines of GPT-3. Given below is a table of the specifications of the model where Params is referred to the parameters. Non-embedding implies parameter count for scaling laws research.

Figure: A basic specification table for GPT-NeoX-20B by EleutherAI

In this architecture, we employ rotary embeddings, which are a form of static relative positional embedding. In short, they compress the embedding space so that the attention of a token ‘m’ to its position at ‘n’ is linearly dependent on m-n. They are formally utilised to modify standard multiheaded attention equations like:

Here x_m, x_n are embeddings of tokens at positions m and n, respectively and W_q^T, W_k are annotations for query and key weights, respectively, to

Here R^d_ϴ,xis a d x d block diagonal matrix for hyperparameters ϴ. The equations above are visually represented in the gradient below:

Figure: Pictorial representation of rotary embeddings from EleutherAI

Training – The model is trained on a custom codebase that is built on Megatron and Deepspeed to facilitate straightforward training of LLMs with tens of billions of parameters. The training is compiled on the official PyTorch v1.10.0 release binary package with CUDA 11.1.

The significance of GPT-NeoX-20B

The GPT-NeoX-20B ushers an era of explosive development in the future as EleutherAI has made it publicly accessible for free. One of the biggest challenges of its predecessors was its restricted access and high training costs. With GPT-NeoX-20B, EleutherAI was able to overcome such hurdles and provide the benefits of a balanced large language model to all.

Moreover, the Engine’s codebase offers a simple and robust configuration using YAML files that enables users to launch training runs across a variety of GPUs with a single line of bash script.

Built on the cluster of 96 state-of-the-art NVIDIA A100 Tensor core GPUs for distributed training, the GPT-NeoX-20B performs quite well in comparison to its counterparts that are available for public access.

Figure: Accuracy task table on standard language models by substack.com

The performance of GPT-NeoX-20B on standard accuracy tasks showcases its custom tokenisation through training on a curated dataset of 825 GB called the Pile.

The language model holds itself quite well when subjected to a test of factual knowledge on various subject groups.

Figure: Subject group comparison table by substack.com

The future of NLP models

The release of GPT-NeoX-20B marks the emergence of a new generation of language models that demonstrate what powerful AI models could look like. To understand the safety of such rapidly evolving models, EleutherAI strives to remove the conventional barriers and boost its development. Connor Leahy, the cofounder of EleutherAI, states: “From spam and astroturfing to chatbot addiction, there are clear harms that can manifest from the use of these models already today, and we expect the alignment of future models to be of critical importance. We think the acceleration of safety research is extremely important, and the benefits of having an open-source model of this size and quality available for that research outweigh the risks.”

EleutherAI is also planning to open up a channel called #20b on Discord for discussions on this model.

Access all our open Survey & Awards Nomination forms in one place >>

Kartik Wali

A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!

Watch More

EleutherAI launches GPT-NeoX-20B, the biggest public-access language model

GPT-NeoX-20B, the new kid in the block

The significance of GPT-NeoX-20B

The future of NLP models

Kartik Wali

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.