Tesla AI Head Andrej Karpathy Creates His Own Mini GPT

“minGPT tries to be small, clean, interpretable and educational, as most of the currently available ones are a bit sprawling.” 

On Monday, Andrej Karpathy, senior director of AI at Tesla, released a library for GPT language model called minGPT. This library written for PyTorch is a re-implementation of GPT training. Karpathy created this clean, interpretable library over the weekend. This library aims to address the existing implementations of GPT on PyTorch, which Karpathy finds to be sprawling. He also warned the users to be wary of this ‘quick’ weekend project as it may contain sharp edges!

“All that’s going on is that a sequence of indices goes into a sequence of transformer blocks, and a probability distribution of the next index comes out. The rest of the complexity is just being clever with batching (both across examples and over sequence length) so that training is efficient,” wrote Karpathy

About minGPT

“GPT is not a complicated model.”

This PyTorch re-implementation is around 300 lines of code that includes boilerplate and a custom causal self-attention module, which Karpathy considers to be totally unnecessary! 

The core minGPT “library” consists of two files: 

  • mingpt/model.py contains the actual Transformer model definition and 
  • mingpt/trainer.py is (GPT-independent) PyTorch boilerplate that trains the model. 

A sample execution of minGPT:

from mingpt.model import GPT, GPTConfig

mconf = GPTConfig(vocab_size, block_size, n_layer=12, n_head=12, n_embd=768) # a GPT-1

model = GPT(mconf)

# construct a trainer

from mingpt.trainer import Trainer, TrainerConfig

tconf = TrainerConfig(max_epochs=10, batch_size=256)

trainer = Trainer(model, train_dataset, test_dataset, tconf)


Using this library, Karpathy believes that developers can replicate the results of GPT-1 and GPT-2. However, Karpathy didn’t consider the highly popular GPT-3 for this weekend project. “It[GPT-3] is likely out of reach as my understanding is that it does not fit into GPU memory and requires a more careful model-parallel treatment,” wrote Karpathy.

There is also a Jupyter notebook attached to this repo that shows how the “library”  can be used to train sequence models.

Know more about minGPT here.

Download our Mobile App

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox