MITB Banner

Hugging Face Already has 1000s of Llama 3 Models – and Counting

No sleep for Hugging Face employees. By next weekend, there will be 10,000+ Llama 3 models.

Share

Illustration by Nikhil Kumar

Listen to this story

Last week, Meta released early versions of its latest large language model, Llama 3, and the reception has been huge. Clem Delangue, co-founder and CEO of Hugging Face, mentioned in a post that by next weekend there will be 10,000 variants available, as already 1000 Llama 3 model variations have been shared publicly on Hugging Face. 

Llama 3 model variations, Source: LinkedIn

This new model includes an image generator that can update pictures in real time as users type prompts. Meta has released two versions of Llama 3 – one with 8 billion parameters and another with 70 billion parameters. 

Meta claims both sizes of Llama 3 beat similarly sized models like Google’s Gemma and Gemini, Mistral 7B, and Anthropic’s Claude 3 on certain benchmarking tests. 

Compared to Meta’s Llama 2 model, the claim made in a Reddit conversation that the Llama-3’s 8B instructed model outperforms the Llama-2’s 70B instructed model on benchmarks is quite remarkable. 

The number of tokens in Llama 3 has quadrupled from 32,000 (Llama 2) to 128,000. With more tokens, Llama 3 can compress sequences more efficiently, cite 15% fewer tokens, and deliver better downstream performance.

Andrej Karpathy, the director of AI at Tesla, in his post, expressed support for releasing base and fine-tuned models of 8B and 70B sizes. He also highlighted the need for smaller models, particularly for educational purposes, unit testing, and potentially for embedded applications.

https://twitter.com/karpathy/status/1781028605709234613

Karpathy also spoke about the limitations. While an increase in the sequence length is a step in the right direction, he noted that it still falls short of the industry-leading standards. “The maximum number of tokens in the context window was bumped up to 8192… quite small w.r.t. modern standards.” 

Beyond the limitations, Perplexity AI CEO Arvind Srinivas, said, “One thing that impresses me most about Llama 3 is how did they pack so much knowledge and reasoning into a dense 8b and a 70b so well, when everyone else has been scaling sparse MoEs. 

This still doesn’t mean having a lot of GPUs is not important. [It’s] probably even more important, considering how many sweeps one has to run to get the right data mixes.”

Pratik Desai, the founder of Kissan AI,  released Dhenu Llama 3, fine-tuned on Llama3 8B. “It is available for anyone to tinker with and provide feedback. Feel free to host and share if you have a spare GPU. We will have an instruction version with a dataset five times larger in the near future,” wrote Desai on X.

While supporting the researchers, Reddit Llama 3 (now available to developers via GroqChat and GroqCloud™)  introduces ‘Llama 3 Researcher’ by GroqInc, delivering Llama 3 8B at 876 tokens/s – the fastest speed we benchmark of any model. 

It is like a GPT-4 level chatbot, available to use completely free, running at over 800 tokens per second on Groq, says Rowan Cheung, the founder of AI newsletter, The Rundown AI

Additionally, Groq is spitting out 800 tokens per second on Llama 3, this portends to new use cases where multiple actions will take place under local AI agent, posted by Brian Roemmele.

Going beyond Llama 3 

Meta’s chief AI scientist, Yann LeCun revealed that even more powerful language models are currently under development. LeCun noted that the most advanced Llama model, with over 400 billion parameters, is undergoing training.

The newly unveiled AI models are set to be integrated into Meta’s virtual assistant, Meta AI, which the company claims is the most advanced among its free-to-use counterparts. 

NVIDIA’s Jim Fan said that the upcoming Llama-3 400B+ will mark the watershed moment when the community gains open-weight access to a GPT-4-class model. Further, he said that it will change the calculus for many research efforts and grassroots startups. 

“I pulled the numbers on Claude 3 Opus, GPT-4, and Gemini. Llama 3 400B is still training and will hopefully get even better in the next few months,” he added, saying that there is so much research potential that can be unlocked with such a powerful backbone. 

Expect a surge in builder energy across the ecosystem!

Share
Picture of Gopika Raj

Gopika Raj

With a Master's degree in Journalism & Mass Communication, Gopika Raj infuses her technical writing with a distinctive flair. Intrigued by advancements in AI technology and its future prospects, her writing offers a fresh perspective in the tech domain, captivating readers along the way.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.