MITB Banner

David vs. Goliath: Does Chinchilla fare well against Google AI’s PaLM?

DeepMind’s claim that large language models were being trained with a suboptimal use of compute was also verified independently later by Google AI’s research.

Share

In 2020, OpenAI published a study titled, ‘Scaling Laws for Neural Language Models’ that demonstrated how increasing the model size resulted in improved performance. It was found that larger models were far more sample-efficient, so optimal compute-efficient training meant training large models on a comparatively smaller amount of data and stopping before convergence. In the recent past, all the important tech companies led the way with creating bigger large language models. The large language model trend culminated with dense models like GPT-3, which has 175 billion parameters, LaMDA, which has 137 billion parameters and Megatron-Turing NLG, which has 530 billion parameters. 

Smaller models, more training tokens

To counter this viewpoint, DeepMind submitted a paper called ‘Training Compute-Optimal Large Language Models’ towards the end of March, which demonstrated that instead of just relying on the model size, the number of training tokens should also increase. The paper notes that usually for, when the computational budget increases by ten times, the size of the model is increased by 5.5 times while the number of training tokens is scaled by 1.8 times. However, the study suggests that the size of the model and the number of training tokens should increase proportionately. 

This theory was tested on a predicted compute-optimal model Chinchilla. The study compared Chinchilla’s 70-billion parameter model to Gopher’s 280-billion parameter model. Despite the smaller size, Chinchilla was trained on four times more data and outperformed Gopher with a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, which is 7 per cent higher. 

       Source: DeepMind blog

Large language models as a norm keep the number of training tokens fixed at around 300 billion. Interestingly, while the cost incurred to train Gopher and Chinchilla was the same, Chinchilla was trained with 1.3 trillion tokens. 

          Source: DeepMind blog

Higher budget, different approach

DeepMind’s claim that large language models were being trained with a suboptimal use of compute was also verified independently by Google AI’s research. At the beginning of the month, Google AI’s research team announced a new architecture called PaLM or the Pathways Language Model, a 540-billion parameter, decoder-only transformer model. Google stated in its findings that PaLM performed very well at English NLP tasks like sentence completion, comprehension and natural language inference, as well as multilingual NLP tasks like translation. The blog stated that the vision for Pathways was for a single AI system to be able to generalise across thousands of tasks with efficiency. 

Incidentally, PaLM was trained on 768 billion tokens, much less than Chinchilla but used five times the compute budget that Chinchilla demanded. PaLM was trained on a combination of data and model parallelism. At the Pod level, the model was trained over two Cloud TPU v4 Pods. This state-of-the-art training achieved a training efficiency of 57.8 per cent hardware FLOPs utilisation, which is the maximum efficiency for LLMs at this scale.  

Source: Google AI blog

PaLM was fed English and multilingual datasets, including books, web documents, Wikipedia, casual conversations and GitHub code. 

Conclusion

PaLM was tested on a set of NLP tasks alongside older large models like Chinchilla, GLaM, GPT-3, Megatron-Turing NLG and Gopher. Of the 29 tasks that included sentence completion, question-answer, reading comprehension and common-sense reasoning tasks, PaLM outperformed all other models in 28 tasks. PaLM was also compared to other LLMs on a range of 150 new language modelling tasks known as the Beyond the Imitation Game Benchmark (BIG-bench). 

While Chinchilla and PaLM were trained on different corpora, PaLM’s 540-billion model performed well at a range of tasks, including coding, where it was on par with OpenAI’s fine-tuned Codex 12B despite being trained on 50 times lesser Python code. At reasoning, PaLM was able to solve 58 per cent of the problems in GSM8K, a benchmark dataset of tough school-level maths questions. The model beat the previous best score set by GPT-3’s 55 per cent. 

PaLM was set against Chinchilla and Gopher across a subset of 58 of these tasks. Again, PaLM emerged on top. The study also found that PaLM’s performance as a “function of scale” follows a log-linear behaviour similar to older models. This signalled that the increase in performance from scale hadn’t reached a plateau yet. 

    Source: Google AI blog
DeepMind later admitted that despite PaLM not being compute-optimal, it would beat Chinchilla if trained on their data. It also predicted that given PaLM’s bigger compute budget, a 140-billion parameter model trained on 3 trillion tokens would give optimal performance and be more efficient for inference.

Share
Picture of Poulomi Chatterjee

Poulomi Chatterjee

Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.