Published on June 26, 2023
In AI Features

LLM Leaderboard Gone Wrong?

LLaMA ranking below Falcon on the Open LLM Leaderboard was questioned by a lot of researchers

By Mohit Pandey

Ever since the UAE’s TII launched Falcon, Hugging Face Open LLM Leaderboard has been trending for both right and wrong reasons. The model came out as the champion of open source on various evaluation metrics. Interestingly, there has been no paper of the model yet. It might be possible that the researchers would have used some other metric or dataset for the evaluation of the model. Hugging Face founders, including Thomas Wolf, the one who made a lot of noise about Falcon reaching the top of the leaderboard, stumbled upon this problem with the evaluation metrics of the recent models. According to the Open LLM Leaderboard, the benchmark of Massive Multitask Language Understanding (MMLU) showed that Meta AI’s LLaMa’s score was significantly lower than the score published in the mode

Subscribe or log in to Continue Reading

Uncompromising innovation. Timeless influence. Your support powers the future of independent tech journalism.

Already have an account? Sign In.

📣 Want to advertise in AIM? Book here

Mohit Pandey

Mohit writes about AI in simple, explainable, and often funny words. He's especially passionate about chatting with those building AI for Bharat, with the occasional detour into AGI.

Odisha Partners With OpenAI to Train Students and Officials in AI

OpenAI, Anthropic Announce Multiple Job Openings in India

OpenAI Opens App Submissions for ChatGPT Integration

OpenAI to Use Amazon’s AI Chips as Part of New $10 Bn Deal: Reports

This AI Startup Wants to Make Calls Inclusive With Sign Language Translation

OpenAI Launches GPT-Image-1.5 to Take on Google NanoBanana Pro

Don’t Miss the Next Big Shift in AI.

Get one year subscription for ₹5999

Building an AI Economy That Includes Everyone

“What we’re seeing is that accessibility-driven design often solves a broader problem. It’s not charity. It’s engineering.”

Top 10 Companies That Crowned Hyderabad as India’s Greenfield GCC Leader in 2025

Telangana has attracted over 75 greenfield GCCs in 2025, compared with 40-plus in Karnataka.

The AI Coding Gold Rush Ends Where Harness Begins

“Only 30% of software engineering happens on the laptop. The real 70% starts after you commit the code,” says Jyoti

How Gradient-Boosting is Quietly Powering India’s Research Push

From groundwater and slopes to carbon sinks, tools like CatBoost are enabling Indian scientists to extract insights and drive sustainability.

India’s Data Centre Boom Is Running Into a Talent Wall

With capacity expected to more than double this decade, the industry is investing in training as graduates struggle to meet

This Firm Wants to be the ‘Next Big Disruptor’ in Networking

Arrcus positions itself as a horizontal software layer that can run across different types of networking hardware.

Will 2026 be the year of AI IPOs?

With CoreWeave’s listing and Fractal Analytics going for an IPO, an array of AI companies are now looking to raise

Fighting Deepfakes May Not Be a Technology Problem

Defenders must be active at all times, while attackers need only one opportunity.

Download the easiest way to
stay informed

Flagship Events

LLM Leaderboard Gone Wrong?

Happy Llama 2026 The Must-Attend Summit for AI Startups Now in Bangalore and San Francisco