Microsoft’s Strategic Shift: Embracing Smaller Language Models with Phi-2

Smaller models not only offer cost-efficiency but also excel in accuracy

Share

Published on November 16, 2023

by Pritam Bordoloi

Listen to this story

At Ignite 2023, Microsoft announced the newest iteration of the Phi Small Language Model (SLM) series termed Phi-2. This comes at a time when many industry members are voicing their opinions that smaller models are going to be more useful for enterprises in comparison to Large Language Models (LLMs).

“Microsoft loves SLMs,” said Satya Nadella, Chairman and CEO at Microsoft in the keynote, and added that Phi-2, which has been developed by Microsoft’s Research Wing on highly specialised datasets, can rival models 150 times bigger.

Phi-2 has 2.7 billion parameters and demonstrates state-of-the-art performances against benchmark testing parameters such as common sense, language understanding and logical reasoning. “Phi-2 is open-source and soon will be coming to Microsoft’s catalogue of models-as-a-service,” Nadella said.

Moreover, in a blog post, Microsoft said that with the right fine-tuning and customisation, these SLMs are incredibly powerful tools for applications both on the cloud and on the edge.

( Microsoft Chairman and CEO Satya Nadella at Microsoft Ignite 2023)

Smaller Language Models are on the rise

In the last year or so, LLMs have captivated our attention, from GPT3.5, GPT-4, PaLM-2 to open-source models like Falcon and LLaMA. However, SLMs today are finding a growing emphasis.

When Meta released LLaMA, which has four variations -7 bn, 13 bn, 33 bn and 65 bn, it heralded the way for SLMs, at least, in some sense. It prompted the realisation that smaller models with fewer parameters can perform admirably.

Given the training of LLMs costs a fortune is one of the primary barriers to adoption. Smaller models present notable cost savings in contrast to GPT-3.5 and GPT-4. The expense for generating a paragraph summary with LLaMA 2, which has three variations—7 bn, 13 bn, and 70 bn— is approximately 30 times lower than that of GPT-4, all while preserving an equivalent level of accuracy.

Smaller models not only offer cost-efficiency but also excel in accuracy. Unlike their larger counterparts trained on vast and diverse datasets, smaller models focus on carefully vetted data tailored to specific business use cases, ensuring precision and relevance.

“Most companies will realise that smaller, cheaper, more specialised models make more sense for 99% of AI use-cases,” Clem Delangue, CEO at HuggingFace predicts.

Sam Altman, OpenAI’s CEO, echoes the sentiment. In a discussion at MIT, Altman envisioned a future where the number of parameters decreases, and a group of smaller models outperforms larger ones.

Microsoft’s efforts in developing smaller models underscore their belief in the significant benefits that SLMs will bring to enterprises in the future.

Microsoft loves SLMs

Earlier this year, besides releasing Phi and Phi 1.5, Microsoft also released Ocra, an open-source model with 13 billion parameters based on Vicuna which can imitate and learn from GPT-4 size LLMs.

During Ignite 2023, Nadella also unveiled ‘models-as-a-service’ offerings, providing enterprises access to various open-source models on platforms like Hugging Face, including models from Mistral and LLaMA 2.

Moreover, Phi-2, which is also available to enterprises in the Azure AI catalogue, could also be seen as a contender for the LLaMA series of models. Earlier this year, Microsoft already claimed that Phi-1.5, which has 1.3 billion parameters, outperforms LlaMA 2’s 7-billion parameters model on several benchmarks.

When LLaMA was released to the public, it had neither Reinforcement Learning with Human Feedback (RLHF), nor instruction or conversation tuning.

However, its open-source nature sparked excitement within the community, leading to a cascade of variants featuring instruction tuning, human evaluations, multimodality, RLHF, and more. It made LLaMA one of the most popular models. Now, Microsoft could look to replicate or surpass the success of LLaMA with Phi-2.

According to Sebastien Bubeck, who leads the ML Foundations team at Microsoft Research, Phi-2 is the perfect model to be fine-tuned. Small enterprises or startups looking to leverage generative AI models could find it to be beneficial.

“I’m sure that there are tons of small AI products that have used non-commercial LLMs like Llama. Phi-2 is going to supplant all of those,” Mark Tenenholtz, VP of Data Science at Predelo said.

Open-source for research purposes only

At the keynote, Nadella revealed that Phi-2 is open source. However, a quick glance at the licence revealed that the model is stated for research purposes only for now. The same was pointed out by many X ( previously Twitter) users.

Ok so looking at the model in azure it has a research license again… pic.twitter.com/u2BGcjWLFH
— anton (@abacaj) November 15, 2023

‘Open-source-for-research-purpose-only’ has a familiar ring, reminiscent of the earlier LLaMA release. In February of this year, Meta shared LLaMA’s model weights with the research community under a noncommercial license. However, it later surfaced on 4Chan with accessible weights, inadvertently making it available for commercial use.

If Microsoft is looking to replicate the success of LLaMA with Phi-2, it needs to make the model available for commercial use. Moreover, over time, the idea of ‘open source’ has faced scrutiny. Although models like LLaMA are touted as open source, some argue they don’t truly fit the definition as Meta has not disclosed the datasets used in their training.

Access all our open Survey & Awards Nomination forms in one place