Last updated March 16, 2022
In AI Mysteries

Andrej Karpathy recreates a 33-year-old deep learning paper

It has been 33 years since the paper was first published. But according to a fun experiment conducted by Tesla’s director of AI, Andrej Karpathy, the paper holds good even now.

Published on March 16, 2022
by Shraddha Goled

In 1989, Yann Lecun introduced a paper, “Backpropagation Applied to Handwritten Zip Code Recognition”. This paper demonstrated how constraints could be integrated into a backpropagation network through its architecture to enhance the ability of the learning network to generalise. In this research, the authors showed how a single network learns the entire recognition operation – from the normalised image of the character to the final classification.

It has been 33 years since the paper was first published. But according to a fun experiment conducted by Tesla’s director of AI, Andrej Karpathy, the paper holds good even now. What’s more – he concluded that the paper will also hold ground 33 years later, that is, in 2055.

According to Karpathy, the only restriction with the 1989 paper was that it used a small dataset which consisted of 7291 16×16 grayscale images of digits, along with a tiny neural network that used just 1,000 neurons. Except for these limitations, other factors like the neural network architecture, loss function, optimisation, among others – checks the model as a ‘modern deep learning paper’.

Andrej Karpathy blog

The recreation

Karpathy wrote in his blog that he re-implemented the whole procedure in PyTorch. The original network was performed on Lisp using the backpropagation simulator SN (proposed by Léon Bottou and Yann LeCun, and later named Lush). On the software design side, Karpathy mentions that it has three main components – a fast general Tensor library for implementing basic mathematical operations; an autograd engine for tracking forward compute graph and generating operations for the backward pass; a scriptable high-level API of common deep learning operations.

Andrej karpathy blog

While the original network trained for three days on a SUN-4/260 workstation, Karpathy chose to run his implementation on MacBook Air (M1) CPU, which took just 90 seconds, resulting in a 3000x naive speedup. The training process required making 23 passes on the training set of 7291 examples for 167,693 presentations to the neural network. Karpathy suggests that the process could be further sped up if full-batch training was performed instead of per example SGD to maximise GPU utilisation, leading to an extra 100x speedup of training latency.

Challenges faced

Karpathy said that he was able to only produce the numbers roughly but not exactly. One of the reasons for this was that the original dataset was not available, and he had to simulate it using larger MNIST data. He considered 28×28 digits and scaled it down to the original 16×16 pixels using bilinear interpolation.

Karpathy also pointed out that the paper was too abstract in terms of the description of the weight initialisation scheme. The specific sparse connectivity between the H1 and H2 layers of the network were chosen by a scheme that is not disclosed in the original 1989 paper; towards this, Karpathy then had to take a ‘sensible guess’ and use an overlapping block sparse structure. Karpathy expressed his doubt over the paper’s claims of using tanh non-linearity instead of normalised tanh, which was trending at the time the original paper was published. Other challenges included formatting errors in the PDF file. “I suspect that there are some formatting errors in the PDF file that, for example, erase dots “.”, making “2.5” look like “2 5”, and potentially (I think?) erasing square roots,” he wrote.

Lessons learnt

2022 version:

Karpathy concludes that not much has changed in the last 33 years, at least on the macro level, in a way that we are still using differentiable neural net architectures, which are made up of layers of neurons that are optimised end-to-end with backpropagation and stochastic gradient descent. However, the dataset and size of the neural network have grown considerably.

Karpathy managed to achieve better performance in terms of speed and error rate. He mentioned that he was able to cut down 60 per cent on the error rate without changing the dataset and test-time latency of the model. “In particular, if I was transported to 1989, I would have ultimately become upper-bounded in my ability to further improve the system without a bigger computer,” he wrote.

2055 version:

Karpathy predicts that the 2055 neural networks would be the same as the 2022 ones on the macro level. The only observable difference could be the size. The datasets and models are expected to be as much as 10,000,000X larger. Since today’s models are not optimally formulated, just by changing the details of the model, the loss function, augmentation, etc., the error rate could be halved. The gains could be further enhanced by scaling up the dataset.

“In its most extreme extrapolation, you will not want to train any neural networks at all. In 2055, you will ask a 10,000,000X-sized neural net mega brain to perform some task by speaking (or thinking) to it in English. And if you ask nicely enough, it will oblige. Yes, you could train a neural net too… but why would you?” he concluded.

Access all our open Survey & Awards Nomination forms in one place >>

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Andrej Karpathy recreates a 33-year-old deep learning paper

The recreation

Challenges faced

Lessons learnt

Shraddha Goled

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

KissanAI Releases Dhenu Llama 3, an Indic LLM for Farmers

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Is it Humane to Bash Humane Ai Pin?

Meta Llama 3 Now Available on Databricks For Enterprise

How Databricks is Enabling Agriculture’s Data Revolution with UPL

How Good is Llama 3 for Indic Languages?

OpenAI Hires Pragya Misra As Its First Employee in India

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

India is Making its Own AI Servers

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

AIM Launches the 3rd Edition of Data Engineering Summit. May 30-31, Bengaluru