Last updated July 28, 2023
In AI Mysteries

6 Brilliant Video Resources on Generative AI by Andrej Karpathy

The videos are very detailed and take you through the step-by-step process of creating different generative AI applications

Share

Published on July 27, 2023

by Shritama Saha

Listen to this story

Former AI director at Tesla Andrej Karpathy returned to OpenAI pretty recently. He came to fame for his immense contribution working alongside CEO Elon Musk to create Optimus, a groundbreaking humanoid robot. Karpathy also played a pivotal role as the head of Tesla Autopilot’s computer vision team. He released NanoGPT, a fast repository for training and tuning medium-sized GPTs, building upon his earlier work with miniGPT for GPT language models. His latest project is baby Llama which he made by tuning NanoGPT to use the Llama 2 architecture instead of GPT-2.

Apart from his big contributions to generative AI, the computer vision genius has been a huge contributor to the open-source community through his mini projects, educational resources, coding tutorials on YouTube and more. He is also known for creating courses on building deep neural networks, including NanoGPT, based on GPT-2/GPT-3 and the ‘Attention is All You Need’ paper.

Here are some free important resources for you.

Let’s build GPT from Scratch

In this two hour long YouTube video, Karpathy takes you on a journey to build a GPT model, based on Google’s research paper “Attention is All You Need” and OpenAI’s GPT-2 and GPT-3. To help the audience grasp the concepts better, he suggests watching earlier videos, which cover autoregressive language modelling framework and the fundamentals of tensors and PyTorch nn, essential knowledge they assume viewers already possess in the current video.

The video is a great resource for anyone who wants to learn more about how GPT works or how to build their own GPT model. It is also a good introduction to the attention mechanism, which is a powerful tool for natural language processing.

State of GPT

If you want to learn more about the training process of GPT assistants like ChatGPT, this video is most suitable for you. It covers tokenization, pretraining, supervised finetuning, and Reinforcement Learning from Human Feedback (RLHF). Additionally, you will also get to know about practical approaches and conceptual frameworks for utilising these models effectively. This includes prompting strategies, finetuning techniques, the ever-expanding toolkit available, and potential future advancements in this field.

Intro to Neural Networks and Backpropagation: Building Micrograd

One of his most admired videos of all time, in this comprehensive guide to backpropagation and neural network training, Karpathy presents a highly detailed and easily understandable explanation. The tutorial assumes minimal prerequisites, needing only a fundamental understanding of Python and basics of high school-level calculus. By breaking down complex concepts into step-by-step instructions, Karpathy ensures that you can understand the complexities of the subject without feeling overwhelmed.

The Spelled-Out Intro to Language Modelling: Building Makemore

By developing a bigram character-level language model as a starting point, Karpathy later advanced it into a contemporary Transformer language model similar to GPT. The main objectives of this particular video are to introduce the audience to torch.Tensor and its nuances, demonstrating its significance in the effective evaluation of neural networks; secondly, to provide an overview of the language modelling framework encompassing tasks such as model training, sampling, and evaluating loss measures like the negative log likelihood utilised in classification tasks. He has explained the process through five detailed videos.

Building Makemore: Activations & Gradients, BatchNorm

This video teaches you the working of internals of Multi-Layer Perceptrons (MLPs) encompassing multiple layers, primarily revolving around the analysis of the results of improper scaling. Moreover, the study focuses on the diagnostic tools and visualisations, crucial for understanding how complex neural networks work. You will also learn about the fragility of training deep neural networks and discover the revolutionary technique known as Batch Normalisation, which greatly simplifies the process.

Building a WaveNet

By taking a 2-layer MLP (Multi-Layer Perceptron), Karpathy shows you how to turn it into a deeper neural network using a tree-like structure, similar to DeepMind’s WaveNet (2016) architecture. The WaveNet paper implements a more efficient version of this hierarchical structure using causal dilated convolutions, which are not yet covered in the video. Throughout the process, viewers gain a better understanding of torch.nn, how it works behind the scenes, and what a typical deep learning development process involves—like reading documentation, keeping track of tensor shapes, and switching between Jupyter notebooks and repository code.

Access all our open Survey & Awards Nomination forms in one place

Shritama Saha

Shritama (she/her) is a technology journalist at AIM who is passionate to explore the influence of AI on different domains including fashion, healthcare and banks.