Listen to this story
|
Microsoft Research has done it once again. After outperforming Meta’s LLaMA with phi-1 in July, the researchers have now introduced phi-1.5, a cutting-edge language model of 1.3 billion parameters that outperforms Llama 2’s 7-billion parameters model on several benchmarks. Microsoft has decided to open source the model.
The phi-1.5 model, comprising a staggering 1.3 billion parameters, has been meticulously crafted to excel in multiple domains, making it the go-to choice for a wide range of applications. It particularly shines when dealing with queries in the question-answering (QA) format, as well as in chat interactions and code-related tasks.
Click here to check out the open source model on Hugging Face.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
How far does one billion parameters take you? As it turns out, pretty far!!!
— Sebastien Bubeck (@SebastienBubeck) September 12, 2023
Today we're releasing phi-1.5, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs.
For warm-up, see an example completion w. comparison to Falcon 7B & Llama2-7B pic.twitter.com/x5qZGPjoSZ
While phi-1 was trained on high-quality textbook data, phi-1.5 is trained on synthetic data only. What sets phi-1.5 apart is its comprehensive training regimen, encompassing a rich tapestry of data sources. The model’s learning journey draws from diverse data pools, including Python code snippets harvested from StackOverflow, code from competitive programming contests, synthetic Python textbooks, and exercises generated by the powerful GPT-3.5-turbo-0301.
Click here to read the paper: Textbooks Are All You Need II: phi-1.5 technical report
Key Details of phi-1.5 Model:
- Architecture: Transformer-based model with a focus on next-word prediction objectives
- Dataset Size: Trained on a vast corpus of 30 billion tokens
- Training Tokens: The model honed its skills on a staggering 150 billion tokens
- Precision: Utilises the fp16 precision standard
- GPUs: Harnesses the power of 32xA100-40G GPUs
- Training Time: Achieved its remarkable capabilities through 8 days of intensive training
The brainpower behind phi-1.5, the Microsoft Research team, asserts that this model has achieved nearly state-of-the-art performance levels among models with less than 10 billion parameters. Rigorous benchmark tests evaluating common sense, language comprehension, and logical reasoning have positioned phi-1.5 as a formidable contender.
Notably, phi-1.5 has outperformed Meta’s Llama-2 7b in the AGIEval score and has approached parity with llama-2 7b in the GPT4ALL’s Benchmark suite, as measured by the LM-Eval Harness.