MITB Banner

Microsoft Releases 1.3 Bn Parameter Language Model, Outperforms LLaMa

One of the co-authors of the paper said that using the textbook quality training data, the results were better than they had anticipated.

Share

Microsoft Introduces Multimodal Kosmos-2.5
Listen to this story

Large language models are getting smaller with great examples like LLaMa and Falcon. Now, Microsoft Research has upped the game with an even smaller model. phi-1 is a transformer based model with just 1.3 billion parameters.

The research paper titled, Textbooks Are All You Need, describes that the model was trained for just four days on 8 A100s with ‘textbook quality’ dataset from the internet with 6 billion tokens, along with synthetically generated textbooks from GPT-3.5 with 1 billion tokens. 

Click here to read the paper. The model will be available on Hugging Face soon.

Despite being small in size, phi-1 attained 50.6% on HumanEval and 55.5% on MBPP. There is another even smaller model with just 350 million parameters called phi-1-small that is trained with the same pipeline as the larger one which still achieves 45% on HumanEval.

For comparison, any other model that achieves greater than 50% on HumanEval is 100 times bigger than this with a large dataset size. 

Ronen Eldan, one of the co-authors of the paper said that using the textbook quality training data for coding, the results were better than they had anticipated. 

In a discussion on HackerNews, a user explains that this would not have been possible without using high quality synthetic dataset that was produced by GPT-3.5. It is clear that training on data from GPT models improves the accuracy and efficiency of the models. 

Instead of increasing the size of the model, by improving the quality of the data models are performing way better. This might result in a shift of paradigm in LLM research with focusing more on the architecture and training of models.

Similarly, an open source alternative to GPT-4, Orca, with just 13 billion parameters was also trained on data by GPT-4 and was able to outperform OpenAI’s offering on several benchmarks. 

On the other hand, a recent paper – The Curse of Recursion – says that training on other LLMs data actually reduces the quality of output of the new model as it results in data poisoning. This is also called the false promise of imitating proprietary LLMs, as the model also inherits flaws from GPT based models. 

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.