MITB Banner

Google Fools Everyone with Gemini 

“Gemini doesn't really beat GPT-4"

Share

Listen to this story

Google appears desperate. After announcing to launch Gemini in fall this year, Google was unable to deliver on its promise. Now, the sudden launch of Gemini as the year ends suggests that Google did not want to be left behind. It seems that it has acted under pressure, when other players like OpenAI and Microsoft were unveiling new products.

Among the three Gemini models released by Google, Gemini Ultra created a buzz as it outperformed OpenAI’s GPT-4 on various benchmarks, including MMLU—a key metric used to evaluate the language model’s capabilities across a spectrum of subjects, ranging from STEM to social sciences and humanities.

Something’s Fishy 

If one delves into the technical report of Gemini, we will discover that on the MMLU benchmark, Gemini Ultra outperformed both GPT-4 and GPT-3.5. However, the twist in the tale is that Google has cleverly employed COT@32 instead of 5-shot learning to enhance the perceived performance of Gemini.

“Digging deeper into the MMLU Gemini Beat – Gemini doesn’t really beat GPT-4. When we evaluate any large language model (LLM) on the MMLU benchmark, we typically employ 5-shot learning,” pointed out Bindu Reddy, the founder of Abacus AI. 

In 5-shot learning, the model is given five examples of each class during the training phase. This limited set of examples serves as the training data, and the model is expected to learn to recognize and generalize patterns effectively based on this small dataset.

On the other hand,  Chain of Thought (CoT) prompting involves providing a series of reasoning steps in the form of a chain of thought to guide the model in generating intermediate rationales for solving a problem. It aims to enhance the multi-step reasoning abilities of LLMs by encouraging them to generate coherent and logical intermediate steps during problem-solving.

“Google has invented a different methodology around CoT@32 to claim that it’s better than GPT-4. CoT@32 only surpasses when you factor in ‘uncertainty routing.’ I need to dig into this more, but it seems like a method that optimises a consensus cutoff to determine when to use the majority approach versus falling back to the max likelihood greedy strategy,” Reddy said, adding, “GPT-4 is still better than Gemini Ultra.” 

Even if Gemini Ultra beats GPT-4, does it truly make a difference? Every other day, new open-source LLMs emerge, boasting superior performance to GPT-4 or GPT-3.5. For instance, Llama 2 is on par with GPT-3.5, while TII’s Falcon 180B, at least on paper, surpasses GPT-3.5.

Regarding Gemini, AI Advisor Vin Vashishta said, “I understand that Gemini’s benchmarks are better, but Generative AI winners won’t be decided by benchmarks. That’s how models win Kaggle, not how products win over customers.”

He added that model metrics must connect with customer and user outcomes, or they’re merely vanity metrics. “Companies are spending millions to publish benchmarks that customers often ignore,” he added. 

Echoing similar sentiments, Reddy said, “When it comes to ChatGPT-like apps, vibes matter, not benchmarks. If your LLM isn’t interesting or spicy and generates boring  corporate speak, it’s not going to make it”. 

Google Fooled Everyone 

Google showcased the multi-modal capabilities of Gemini Ultra through a demo video. However, later it was found that the video was staged. 

The six-minute video uploaded by Google guides us through various examples where Gemini engages in fluent conversations, responding to queries and participating in activities such as playing games like rock-paper-scissors with a person. 

In the demo, it seems that everything is happening in real time and Gemini is quickly able to respond. On the contrary, the Youtube description of the video reads, “For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.” 

In reality, the demonstration didn’t happen in real-time or with voice interaction. When Bloomberg reached out to Google about the video, a spokesperson explained that it was created “using still image frames from the footage, and prompting via text.” Simply put, they first gave pictures to Gemini, and then they wrote text prompts to get the output.

This is not the first time when Google has tried to pull off something just by marketing. In a recent move, it took a dig at AWS by displaying a Google Cloud ad on Sphere in Las Vegas during the AWS re:Invent. 

However, Gemini Ultra isn’t out yet. Who knows, it might actually be better than GPT-4 by the time it comes out next year. Google can only hope that OpenAI doesn’t release GPT-5 by then.

Share
Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India