MITB Banner

Microsoft’s Strategic Shift: Embracing Smaller Language Models with Phi-2

Smaller models not only offer cost-efficiency but also excel in accuracy

Share

Listen to this story

At Ignite 2023, Microsoft announced the newest iteration of the Phi Small Language Model (SLM) series termed Phi-2. This comes at a time when many industry members are voicing their opinions that smaller models are going to be more useful for enterprises in comparison to Large Language Models (LLMs). 

“Microsoft loves SLMs,” said Satya Nadella, Chairman and CEO at Microsoft in the keynote, and added that Phi-2, which has been developed by Microsoft’s Research Wing on highly specialised datasets, can rival models 150 times bigger. 

Phi-2 has 2.7 billion parameters and demonstrates state-of-the-art performances against benchmark testing parameters such as common sense, language understanding and logical reasoning. “Phi-2 is open-source and soon will be coming to Microsoft’s catalogue of models-as-a-service,” Nadella said.

Moreover, in a blog post, Microsoft said that with the right fine-tuning and customisation, these SLMs are incredibly powerful tools for applications both on the cloud and on the edge.  

( Microsoft Chairman and CEO Satya Nadella at Microsoft Ignite 2023)

Smaller Language Models are on the rise

In the last year or so, LLMs have captivated our attention, from GPT3.5, GPT-4, PaLM-2 to open-source models like Falcon and LLaMA. However, SLMs today are finding a growing emphasis. 

When Meta released LLaMA, which has four variations -7 bn, 13 bn, 33 bn and 65 bn, it heralded the way for SLMs, at least, in some sense. It prompted the realisation that smaller models with fewer parameters can perform admirably.

Given the training of LLMs costs a fortune is one of the primary barriers to adoption. Smaller models present notable cost savings in contrast to GPT-3.5 and GPT-4. The expense for generating a paragraph summary with LLaMA 2, which has three variations—7 bn, 13 bn, and 70 bn— is approximately 30 times lower than that of GPT-4, all while preserving an equivalent level of accuracy.

Smaller models not only offer cost-efficiency but also excel in accuracy. Unlike their larger counterparts trained on vast and diverse datasets, smaller models focus on carefully vetted data tailored to specific business use cases, ensuring precision and relevance.

“Most companies will realise that smaller, cheaper, more specialised models make more sense for 99% of AI use-cases,” Clem Delangue, CEO at HuggingFace predicts.

Sam Altman, OpenAI’s CEO, echoes the sentiment. In a discussion at MIT, Altman envisioned a future where the number of parameters decreases, and a group of smaller models outperforms larger ones.

Microsoft’s efforts in developing smaller models underscore their belief in the significant benefits that SLMs will bring to enterprises in the future.

Microsoft loves SLMs

Earlier this year, besides releasing Phi and Phi 1.5, Microsoft also released Ocra, an open-source model with 13 billion parameters based on Vicuna which can imitate and learn from GPT-4 size LLMs.

During Ignite 2023, Nadella also unveiled ‘models-as-a-service’ offerings, providing enterprises access to various open-source models on platforms like Hugging Face, including models from Mistral and LLaMA 2.

Moreover, Phi-2, which is also available to enterprises in the Azure AI catalogue, could also be seen as a contender for the LLaMA series of models. Earlier this year, Microsoft already claimed that Phi-1.5, which has 1.3 billion parameters, outperforms LlaMA 2’s 7-billion parameters model on several benchmarks.

When LLaMA was released to the public, it had neither Reinforcement Learning with Human Feedback (RLHF), nor instruction or conversation tuning. 

However, its open-source nature sparked excitement within the community, leading to a cascade of variants featuring instruction tuning, human evaluations, multimodality, RLHF, and more. It made LLaMA one of the most popular models. Now, Microsoft could look to replicate or surpass the success of LLaMA with Phi-2. 

According to Sebastien Bubeck, who leads the ML Foundations team at Microsoft Research, Phi-2 is the perfect model to be fine-tuned. Small enterprises or startups looking to leverage generative AI models could find it to be beneficial.

“I’m sure that there are tons of small AI products that have used non-commercial LLMs like Llama. Phi-2 is going to supplant all of those,” Mark Tenenholtz, VP of Data Science at Predelo said.

Open-source for research purposes only

At the keynote, Nadella revealed that Phi-2 is open source. However, a quick glance at the licence revealed that the model is stated for research purposes only for now. The same was pointed out by many X ( previously Twitter) users. 

‘Open-source-for-research-purpose-only’ has a familiar ring, reminiscent of the earlier LLaMA release. In February of this year, Meta shared LLaMA’s model weights with the research community under a noncommercial license. However, it later surfaced on 4Chan with accessible weights, inadvertently making it available for commercial use.

If Microsoft is looking to replicate the success of LLaMA with Phi-2, it needs to make the model available for commercial use. Moreover, over time, the idea of ‘open source’ has faced scrutiny. Although models like LLaMA are touted as open source, some argue they don’t truly fit the definition as Meta has not disclosed the datasets used in their training.

Share
Picture of Pritam Bordoloi

Pritam Bordoloi

I have a keen interest in creative writing and artificial intelligence. As a journalist, I deep dive into the world of technology and analyse how it’s restructuring business models and reshaping society.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.