Andrej Karpathy Launches Advanced NanoGPT

Built on minGPT, NanoGPT is a new repository for training and fine tuning medium-sized GPTs.
Listen to this story

Former Tesla AI head Andrej Karpathy recently released an updated version of minGPT, NanoGPT, a new fast repository for training and fine-tuning medium-sized GPTs. Prior to this, in 2020, he unveiled the minGPT library for the GPT language model to address the existing implementations of GPT on PyTorch. 

Check out the GitHub repository here

Currently working on reproducing GPT-2 on the OpenWebText dataset, the NanoGPT strives to be plain and readable: 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
  • train.py – is a ~300-line boilerplate training loop
  • model.py is a ~300-line GPT model definition, which can optionally load the GPT-2 weights from OpenAI

Dependencies

  • pytorch 
  • pip install datasets for huggingface datasets
  • pip install tiktoken for OpenAI’s fast bpe code 
  • pip install wandb for optional logging 
  • pip install tqdm

Usage

Some documents are tokenised into one long 1D array of indices to render a dataset. Eg, for OpenWebText see:

$ cd data/openwebtext

$ python prepare.py

This will generate two files, train.bin and val.bin, each containing a sequence of raw uint16 bytes representing a GPT2 BPE token id. The training script currently attempts to replicate the smallest GPT-2 version made available by OpenAI, the 124M version. We have to run the script with torchrun to train using PyTorch Distributed Data-Parallel (DDP). 

Fine Tuning

Go to data/shakespeare and prepare.py to get the shakespeare dataset and render it into a train.bin and val.bin to finetune a GPT on new text. It will take less time to run than OpenWebText. It only takes a few minutes to finetune a single GPT. Run the following example finetuning:

$ python train.py finetune_shakespeare

This will load the config parameter overrides in config/finetune_shakespeare.py 

Baselines

OpenAI GPT-2 checkpoints enable us to get some baselines for openwebtext. We can get the numbers as follows:

$ python train.py eval_gpt2

$ python train.py eval_gpt2_medium

$ python train.py eval_gpt2_large

$ python train.py eval_gpt2_xl

and observe the following losses on train and val: 

Benchmarking

For model benchmarking, bench.py might be useful. It’s identical to what happens in the training loop of train.py but removes many of the other complexities.

Efficiency Notes

The code, by default, now uses PyTorch 2.0, which makes torch.compile() available in the release. The improvement from the one line of code is noticeable. 

NanoGPT is set to be updated with new developments that include the following:

  • Additional optimizations to the running time
  • Report and track other metrics like PPL
  • Eval zero-shot perplexities on PTB, WikiText, and other related benchmarks
  • Add some finetuning datasets and guide on some datasets for demonstration.
  • Reproduce GPT-2 results. It was estimated ~3 years ago that the training cost of the 1.5B model was ~$50K

MinGPT

Two files make up the “library” of minGPT:

  • mingpt/model.py contains the actual Transformer model definition and 
  • mingpt/trainer.py is the (GPT-independent) PyTorch boilerplate that trains the model. 

A Jupyter notebook is also attached to this repo, showing how the “library” can train sequence models.

Shritama Saha
Shritama Saha is a technology journalist who is keen to learn about AI and analytics play. A graduate in mass communication, she is passionate to explore the influence of data science on fashion, drug development, films, and art.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR