Last updated December 28, 2023
In AI Origins & Evolution

Gemini Pro vs GPT-4V: Has Google Killed it This Time?

In highlighting the impressive capabilities of GPT-4 V within benchmark scenarios, it's crucial to recognise the parallel strengths that Gemini Pro shares with it.

Share

Illustration by Raghavendra Rao

Published on December 28, 2023

by Sandhra Jayan

Despite Google’s release of its competitor, Gemini Pro, there are claims that it has fallen short of expectations compared to OpenAI’s GPT-4. The ongoing debate revolves around whether Gemini or GPT-4V are comprehensively superior. While many opinions lean towards GPT-4V, it’s important to acknowledge that Google’s Gemini Pro is not far behind.

Recently, a research paper by researchers from Hong Kong and Shanghai, titled Gemini Pro vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases, compared both the model’s vision capabilities, and the results are quite interesting.

Gemini demonstrated superior performance in specific reasoning tasks, particularly logical reasoning and factual accuracy. This positions Gemini as a suitable choice for tasks requiring robust comprehension and analytical capabilities. Hence, it’s essential to recognise the strengths of both models. Favouring GPT-4V in discussions may not be entirely justified.

GPT-4V vs Gemini

The study shows that GPT-4 V exhibited precision and succinctness in its responses, showcasing a notable strength in contextual understanding. On the other hand, Gemini Pro excelled in providing detailed and expansive answers, coupled with relevant imagery and links, highlighting its capacity for rich content generation. In industrial application scenarios, both models demonstrated competency, albeit with nuanced differences.

Gemini’s limitation to inputting a single image at a time, dependent on accompanying textual instructions, contrasts with GPT-4 V(ision)’s ability to continuously ingest multiple images, enhancing its memory capabilities. While both models exhibit comparable proficiency in basic image recognition tasks, GPT-4 Vision shines in real-world object localization, particularly in abstract image (tangram) localization.

Text extraction from images is a strength for both models, but Gemini surpasses GPT-4 Vision in reading table information. Both models demonstrate common-sense understanding in advanced reasoning tasks, with Gemini slightly trailing in certain intelligence tests. Notably, both models excel in emotional understanding and expression.

The choice between GPT-4 and Gemini hinges on the specific task requirements. GPT -4 is favoured for multimodal and prompted tasks and Gemini for code-related endeavours or scenarios prioritising computational efficiency.

Did Gemini pass the test?

When Google showcased the multi-modal capabilities of Gemini Ultra through a demo video on launch, everyone was awestruck. However, later it was found that the video was staged.

The six-minute video uploaded by Google guides us through various examples where Gemini engages in fluent conversations, responding to queries and participating in activities such as playing games like rock-paper-scissors with a person.

In the demo, everything seems to be happening in real time and Gemini can quickly respond. On the contrary, the Youtube description of the video reads, “For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.” But that is not the case with Gemini Pro regarding performance.

In highlighting the impressive capabilities of GPT-4 V within benchmark scenarios, it’s crucial to recognise the parallel strengths that Gemini Pro shares with it. Gemini distinguishes itself through its ability to provide concise and direct responses, offering a significant advantage in tasks that demand factual accuracy and prompt information retrieval.

This commonality underscores the nuanced effectiveness of both models in addressing specific challenges and reinforces the notion that advancements in one can often resonate with the capabilities of the other. Its strong reasoning ability, particularly in expert tasks, and improved identification accuracy, especially in recognising celebrities, showcase its prowess in specialised domains.

Gemini stands out in code-related tasks, demonstrating proficiency in code generation, comprehension, translation, and bug detection, making it a preferred choice for developers. It also boasts general reasoning capabilities and is touted for its scalability and efficiency.

However, both models share weaknesses, including limitations in spatial awareness, unreliable OCR, inconsistencies in reasoning, and sensitivity to prompts. The absence of the specific report’s details hinders a more in-depth analysis, emphasising the need for quantitative benchmarks and staying abreast of ongoing developments in both models, which are actively evolving. Although Gemini Ultra will be released next year, if you prioritise practicality, efficiency, and wider accessibility, Pro is likely the better choice.

Access all our open Survey & Awards Nomination forms in one place

Sandhra Jayan

Sandhra Jayan is an enthusiastic tech journalist with a flair for uncovering the latest trends in the AI landscape. Known for her compelling storytelling and insightful analysis, she transforms complex tech narratives into captivating, accessible content. Reach out to her at sandhra.jayan@analyticsindiamag.com