Tesla AI Head Andrej Karpathy Creates His Own Mini GPT

“minGPT tries to be small, clean, interpretable and educational, as most of the currently available ones are a bit sprawling.” 

On Monday, Andrej Karpathy, senior director of AI at Tesla, released a library for GPT language model called minGPT. This library written for PyTorch is a re-implementation of GPT training. Karpathy created this clean, interpretable library over the weekend. This library aims to address the existing implementations of GPT on PyTorch, which Karpathy finds to be sprawling. He also warned the users to be wary of this ‘quick’ weekend project as it may contain sharp edges!

“All that’s going on is that a sequence of indices goes into a sequence of transformer blocks, and a probability distribution of the next index comes out. The rest of the complexity is just being clever with batching (both across examples and over sequence length) so that training is efficient,” wrote Karpathy

About minGPT

“GPT is not a complicated model.”

This PyTorch re-implementation is around 300 lines of code that includes boilerplate and a custom causal self-attention module, which Karpathy considers to be totally unnecessary! 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

The core minGPT “library” consists of two files: 

  • mingpt/model.py contains the actual Transformer model definition and 
  • mingpt/trainer.py is (GPT-independent) PyTorch boilerplate that trains the model. 

A sample execution of minGPT:

from mingpt.model import GPT, GPTConfig

mconf = GPTConfig(vocab_size, block_size, n_layer=12, n_head=12, n_embd=768) # a GPT-1

model = GPT(mconf)

# construct a trainer

from mingpt.trainer import Trainer, TrainerConfig

tconf = TrainerConfig(max_epochs=10, batch_size=256)

trainer = Trainer(model, train_dataset, test_dataset, tconf)

trainer.train()

Using this library, Karpathy believes that developers can replicate the results of GPT-1 and GPT-2. However, Karpathy didn’t consider the highly popular GPT-3 for this weekend project. “It[GPT-3] is likely out of reach as my understanding is that it does not fit into GPU memory and requires a more careful model-parallel treatment,” wrote Karpathy.

There is also a Jupyter notebook attached to this repo that shows how the “library”  can be used to train sequence models.

Know more about minGPT here.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM