Meet the Creator of Microsoft Phi-2

Harkirat Behl also loves Meta's Llama 2 and is urging Indian companies to build foundational Indic language models.

Share

Illustration by Nikhil Kumar

Published on February 15, 2024

by Mohit Pandey

A few days back, Microsoft released a blog on ‘Three Big AI Trends to Watch in 2024’, which highlighted the impact of small language models (SLMs) such as Orca and Phi, along with multimodal AI, and AI in science. AIM got in touch with Harkirat Behl, a senior researcher in the Physics of AGI team of Microsoft Research, who is also one of the creators of Phi-1, Phi-1.5, and Phi-2.

Behl said that his team is currently working on the next version of Phi-2 and making it more capable. “Phi-1.5 started showing great coding capabilities, Phi-2 was code with common sense abilities, and the next one would be even more capable,” he said.

Behl is also working in video generation as it is one of the newest trends catching up in the AI industry. He recently published a paper called PEEKABOO, which is focused on creating such digital content, highlighting Microsoft’s interest in multimodal AI.

How is Phi-2 better?

“One of the things that makes Phi-2 better than Meta’s Llama 2 7B and other models is that its 2.7 billion parameter size is very well suited for fitting on a phone,” said Behl. He appreciated that a lot of Indic models are currently being built on top of Llama 2 and while he acknowledged Meta’s great work on Llama 2, he also encouraged people to build on top of Phi-2.

“When GPT-3 came out, everyone, including Google, started making big models. But then began the discussions around the scaling laws and how efficient these bigger models are, giving rise to smaller language models for specific tasks,” said Behl.

He said that scaling laws are not necessarily true. “You don’t need a specific size or number of parameters for a model to get good at coding,” said Behl, saying that you do not need large models to instil intelligence. “All you need is a small amount of high quality data, aka textbook quality data.”

Citing Phi-2, Behl said that training models on synthetic data reduces the size of the model, and also brings in a lot of capabilities within them, which is different from how GPT-3 was trained. “Textbooks are written by experts in the field, unlike the internet where anybody can write and post, which is how GPT-3 is trained,” said Behl.

Calls for Indic open source moment

With smaller and open language models such as Meta’s Llama 2 and Microsoft’s Phi-2 performing on par with their larger counterparts on specific tasks and ranking on top of Open LLM Leaderboard, the conversation has completely shifted to building smaller models for specialised use cases and domains for maximum outcomes and efficiency.

“I believe that India should release its own foundational models. If there are models from China on top of the leaderboard, why aren’t there any from India?” said Behl, adding that Indian tech companies should focus on this and also partner with academic institutions to build models.

Behl emphasised that centralisation of compute would solve a lot of the issues with building models in India. “I think all IITs should come together and fund the resources. This way we would have enough resources for building a foundational model,” said Behl, giving an example of how the UK government did a similar thing for building AI capabilities.

“It is necessary to build Indic and local language models as everyone would be able to use them. At the same time, it would also be great to have an Indian model to compete on benchmarks against other countries. India should definitely do that,” emphasised Behl.

A science lover

Behl started his journey in AI from IIT Kanpur. Even before that he had worked on robotics projects and built an autonomous underwater vehicle that would recognise objects underwater. “My interest was always in computer vision, and that is when I applied for an internship at Oxford,” said Behl, where he collaborated with Google DeepMind to learn about safety in AI. He later went on to do an internship with Microsoft Research.

For almost two years, Behl focused on working on automation and how it can be democratised to everyone. “The scope is that everybody should be able to do [automation], not only machine learning experts,” said Behl. He then worked on training models that could find synthetic data automatically, which was also his focus during PhD.

Apart from working on generative AI and Microsoft, Behl is very keen to learn more about science and how AI is solving a lot of scientific problems. There are a lot of scientific problems that need solving. Behl said that Google’s AlphaFold paper for predicting protein structures was a great example of what AI can do in the scientific world.

Access all our open Survey & Awards Nomination forms in one place