Now Reading
Tesla AI Head Andrej Karpathy Creates His Own Mini GPT

Tesla AI Head Andrej Karpathy Creates His Own Mini GPT

Ram Sagar

“minGPT tries to be small, clean, interpretable and educational, as most of the currently available ones are a bit sprawling.” 

On Monday, Andrej Karpathy, senior director of AI at Tesla, released a library for GPT language model called minGPT. This library written for PyTorch is a re-implementation of GPT training. Karpathy created this clean, interpretable library over the weekend. This library aims to address the existing implementations of GPT on PyTorch, which Karpathy finds to be sprawling. He also warned the users to be wary of this ‘quick’ weekend project as it may contain sharp edges!

“All that’s going on is that a sequence of indices goes into a sequence of transformer blocks, and a probability distribution of the next index comes out. The rest of the complexity is just being clever with batching (both across examples and over sequence length) so that training is efficient,” wrote Karpathy

Register for Analytics Olympiad 2021>>

About minGPT

“GPT is not a complicated model.”

This PyTorch re-implementation is around 300 lines of code that includes boilerplate and a custom causal self-attention module, which Karpathy considers to be totally unnecessary! 

The core minGPT “library” consists of two files: 

  • mingpt/ contains the actual Transformer model definition and 
  • mingpt/ is (GPT-independent) PyTorch boilerplate that trains the model. 

A sample execution of minGPT:

from mingpt.model import GPT, GPTConfig

mconf = GPTConfig(vocab_size, block_size, n_layer=12, n_head=12, n_embd=768) # a GPT-1

model = GPT(mconf)

# construct a trainer

from mingpt.trainer import Trainer, TrainerConfig

See Also

tconf = TrainerConfig(max_epochs=10, batch_size=256)

trainer = Trainer(model, train_dataset, test_dataset, tconf)


Using this library, Karpathy believes that developers can replicate the results of GPT-1 and GPT-2. However, Karpathy didn’t consider the highly popular GPT-3 for this weekend project. “It[GPT-3] is likely out of reach as my understanding is that it does not fit into GPU memory and requires a more careful model-parallel treatment,” wrote Karpathy.

There is also a Jupyter notebook attached to this repo that shows how the “library”  can be used to train sequence models.

Know more about minGPT here.

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top