Listen to this story
|
Last week, Meta released early versions of its latest large language model, Llama 3, and the reception has been huge. Clem Delangue, co-founder and CEO of Hugging Face, mentioned in a post that by next weekend there will be 10,000 variants available, as already 1000 Llama 3 model variations have been shared publicly on Hugging Face.
This new model includes an image generator that can update pictures in real time as users type prompts. Meta has released two versions of Llama 3 – one with 8 billion parameters and another with 70 billion parameters.
Meta claims both sizes of Llama 3 beat similarly sized models like Google’s Gemma and Gemini, Mistral 7B, and Anthropic’s Claude 3 on certain benchmarking tests.
Compared to Meta’s Llama 2 model, the claim made in a Reddit conversation that the Llama-3’s 8B instructed model outperforms the Llama-2’s 70B instructed model on benchmarks is quite remarkable.
The number of tokens in Llama 3 has quadrupled from 32,000 (Llama 2) to 128,000. With more tokens, Llama 3 can compress sequences more efficiently, cite 15% fewer tokens, and deliver better downstream performance.
Andrej Karpathy, the director of AI at Tesla, in his post, expressed support for releasing base and fine-tuned models of 8B and 70B sizes. He also highlighted the need for smaller models, particularly for educational purposes, unit testing, and potentially for embedded applications.
Karpathy also spoke about the limitations. While an increase in the sequence length is a step in the right direction, he noted that it still falls short of the industry-leading standards. “The maximum number of tokens in the context window was bumped up to 8192… quite small w.r.t. modern standards.”
Beyond the limitations, Perplexity AI CEO Arvind Srinivas, said, “One thing that impresses me most about Llama 3 is how did they pack so much knowledge and reasoning into a dense 8b and a 70b so well, when everyone else has been scaling sparse MoEs.
This still doesn’t mean having a lot of GPUs is not important. [It’s] probably even more important, considering how many sweeps one has to run to get the right data mixes.”
Pratik Desai, the founder of Kissan AI, released Dhenu Llama 3, fine-tuned on Llama3 8B. “It is available for anyone to tinker with and provide feedback. Feel free to host and share if you have a spare GPU. We will have an instruction version with a dataset five times larger in the near future,” wrote Desai on X.
While supporting the researchers, Reddit Llama 3 (now available to developers via GroqChat and GroqCloud™) introduces ‘Llama 3 Researcher’ by GroqInc, delivering Llama 3 8B at 876 tokens/s – the fastest speed we benchmark of any model.
It is like a GPT-4 level chatbot, available to use completely free, running at over 800 tokens per second on Groq, says Rowan Cheung, the founder of AI newsletter, The Rundown AI.
Additionally, Groq is spitting out 800 tokens per second on Llama 3, this portends to new use cases where multiple actions will take place under local AI agent, posted by Brian Roemmele.
Going beyond Llama 3
Meta’s chief AI scientist, Yann LeCun revealed that even more powerful language models are currently under development. LeCun noted that the most advanced Llama model, with over 400 billion parameters, is undergoing training.
The newly unveiled AI models are set to be integrated into Meta’s virtual assistant, Meta AI, which the company claims is the most advanced among its free-to-use counterparts.
NVIDIA’s Jim Fan said that the upcoming Llama-3 400B+ will mark the watershed moment when the community gains open-weight access to a GPT-4-class model. Further, he said that it will change the calculus for many research efforts and grassroots startups.
“I pulled the numbers on Claude 3 Opus, GPT-4, and Gemini. Llama 3 400B is still training and will hopefully get even better in the next few months,” he added, saying that there is so much research potential that can be unlocked with such a powerful backbone.
Expect a surge in builder energy across the ecosystem!