“minGPT tries to be small, clean, interpretable and educational, as most of the currently available ones are a bit sprawling.”
On Monday, Andrej Karpathy, senior director of AI at Tesla, released a library for GPT language model called minGPT. This library written for PyTorch is a re-implementation of GPT training. Karpathy created this clean, interpretable library over the weekend. This library aims to address the existing implementations of GPT on PyTorch, which Karpathy finds to be sprawling. He also warned the users to be wary of this ‘quick’ weekend project as it may contain sharp edges!
I wrote a minimal/educational GPT training library in PyTorch, am calling it minGPT as it is only around ~300 lines of code: https://t.co/79S9lShJRN +demos for addition and character-level language model. (quick weekend project, may contain sharp edges)
— Andrej Karpathy (@karpathy) August 17, 2020
“All that’s going on is that a sequence of indices goes into a sequence of transformer blocks, and a probability distribution of the next index comes out. The rest of the complexity is just being clever with batching (both across examples and over sequence length) so that training is efficient,” wrote Karpathy
About minGPT
“GPT is not a complicated model.”
This PyTorch re-implementation is around 300 lines of code that includes boilerplate and a custom causal self-attention module, which Karpathy considers to be totally unnecessary!
The core minGPT “library” consists of two files:
- mingpt/model.py contains the actual Transformer model definition and
- mingpt/trainer.py is (GPT-independent) PyTorch boilerplate that trains the model.
A sample execution of minGPT:
from mingpt.model import GPT, GPTConfig
mconf = GPTConfig(vocab_size, block_size, n_layer=12, n_head=12, n_embd=768) # a GPT-1
model = GPT(mconf)
# construct a trainer
from mingpt.trainer import Trainer, TrainerConfig
tconf = TrainerConfig(max_epochs=10, batch_size=256)
trainer = Trainer(model, train_dataset, test_dataset, tconf)
trainer.train()
Using this library, Karpathy believes that developers can replicate the results of GPT-1 and GPT-2. However, Karpathy didn’t consider the highly popular GPT-3 for this weekend project. “It[GPT-3] is likely out of reach as my understanding is that it does not fit into GPU memory and requires a more careful model-parallel treatment,” wrote Karpathy.
There is also a Jupyter notebook attached to this repo that shows how the “library” can be used to train sequence models.
Know more about minGPT here.