MITB Banner

Microsoft’s Phi-3 Outperforms Meta’s Llama 3 and Fits Perfectly on an iPhone

Microsoft shows who is the boss of tiny open source models.

Share

Microsoft’s Phi-3 Outperforms Meta’s Llama 3 and Fits Perfectly on an iPhone

Illustration by Raghavendra Rao

Listen to this story

“One of the things that makes Phi-2 better than Meta’s Llama 2 7B and other models is that its 2.7 billion parameter size is very well suited for fitting on a phone,” said Harkirat Behl, one of the creators of the model, who has now created Phi-3, the latest open source model by Microsoft.

Phi-3-Mini is a 3.8 billion parameter language model trained on an extensive dataset of 3.3 trillion tokens. Despite its compact size, the Phi-3-Mini boasts performance levels that not just exceed the recent ones such as Mixtral 8x7B and GPT-3.5, but even surpass the recently launched Meta’s Llama 3 8B on MMLU benchmarks.

Despite these high capabilities, Phi-3-Mini can run locally on a cell phone. Its small size allows it to be quantised to 4 bits, occupying approximately 1.8GB of memory. Microsoft tested the quantised model by deploying Phi-3-Mini on an iPhone 14 with an A16 Bionic chip, running natively on the device and fully offline, achieving more than 12 tokens per second.

Along with this, Microsoft has also introduced Phi-3-Small and Phi-3-Medium models, both significantly more capable than Phi-3-Mini. The Phi-3-Small 7 billion parameter model achieves an MMLU score of 75.3 outperforms Meta’s recently launched Llama 3 8B Instruct with a score of 66.

With a Grain of Salt

“To best benefit the open source community, Phi-3-Mini is built upon a similar block structure as Llama-2,” reads the technical report by Microsoft. But currently, the model is limited to English, which is not ideal for other languages and for Indic AI developers. 

The innovation behind Phi-3-Mini lies in its training dataset, an expanded version of the one used for its predecessor, Phi-2. This dataset comprises heavily filtered web and synthetic data. The model has also been optimised for robustness, safety, and chat format.

Given that small open-source models are performing so well, it wouldn’t be surprising if soon there is a model outperforming OpenAI’s GPT-4. Interestingly, Meta is also training a model with around 400 billion parameters which would possibly be able to outperform the closed models once it is launched. 

“BUT – as with all (tiny) models, benchmarks tell us less than vibes,” said Matt Shumer on X. In a discussion, people highlight the issue with the benchmarks of the model. “According to what I’ve read, Phi-2 was much worse than its benchmark numbers suggested. This model follows the same training strategy,” read a comment. 

Since the model is built by Microsoft and uses synthetic data for training, it is possibly using GPT-4 output for training. “I don’t think it’s impossible for a small model to be very good. I see their ‘synthetic data’ as essentially a way of distilling GPT-4 into smaller models,” said the same user.

Furthermore, the model is also trained with only 4.8 trillion tokens, which is significantly less than 15 trillion tokens that Llama 3 was trained on. Regardless, the model can run on a phone, which the Llama series of models are still a little far away from given the size. 

Moreover, Phi models aren’t specifically tuned for chat or instruct, which makes them perform slightly worse when compared to Llama models when incorporating in real world scenarios.

On the other hand, Behl had told AIM that scaling laws are not necessarily true. “You don’t need a specific size or number of parameters for a model to get good at coding,” said Behl, saying that you do not need large models to instil intelligence. “All you need is a small amount of high quality data, aka textbook quality data.” 

This is what is continued with Phi-3.

What Dent will it Make?

Since the model is built for on-device and edge use cases, it is ideal for the ongoing shift towards AI devices. Moreover, Apple is also experimenting with AI on edge, and Phi-3 might give Microsoft an edge (pun intended) over Apple.

Moreover, since such small models are outperforming larger models, this might also possibly make an impact on OpenAI’s release of GPT-5, as enterprises are also increasingly adopting open source models. Who knows, the company might decide to open source one of its upcoming models, though that seems highly unlikely for now. 

Microsoft has also kept in mind the need to make LLMs that are up to date with current information, thus have made Phi-3 ideal for RAG use cases as well. 

Microsoft believes that training models on synthetic data reduces the size of the model, and also brings in a lot of capabilities within them, which is different from how GPT-3 was trained. “Textbooks are written by experts in the field, unlike the internet where anybody can write and post, which is how GPT-3 is trained,” said Behl. 

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.