MITB Banner

What Makes Anthropic’s Claude 3 Special

One of the primary reasons why developers love Claude 3 is because of its 200k token context window, a jump from 100,000 tokens in Claude 2.

Share

Listen to this story

Amazon’s four-billion dollar baby Anthropic recently released Claude 3, a family of generative AI models called Haiku, Sonnet and Opus, which surpasses GPT-4 on prominent benchmarks, including near-instant results and strong reasoning capabilities. It has also outperformed Gemini 1.0 Pro and is at par or shows competitive results with Gemini 1.0 Ultra. 

Longer Context Length

The Claude 3 model series debuts with a 200,000-token context window, a jump from 100,000 tokens in the second version of Claude. However, these models are flexible in accommodating inputs surpassing one million tokens for selected customers. 

In contrast, Gemini 1.5 shows a substantial leap in performance, leveraging advancements in research and engineering across foundational model development and infrastructure. Notably, Gemini 1.5 Pro, the first model released for early testing, introduces a mid-size multimodal architecture optimised for diverse tasks. Positioned at a performance level akin to 1.0 Ultra, Gemini 1.5 Pro pioneers a breakthrough experimental feature in long-context understanding.

On the other hand, Gemini 1.5 has a 128,000 token context window. Still, like Claude, it allows a select group of developers and enterprise customers to explore an extended context window of up to one million tokens via AI Studio and Vertex AI in private preview. 

Unfortunately, the weakest in this space is OpenAI’s GPT-4, which sets a maximum context length of 32,000 tokens. However,  GPT-4 Turbo can process up to 128,000 tokens. 

Improved Reasoning and Understanding

Another interesting feature that has caught everyone’s attention is the ‘Needle In A Haystack’ (NIAH) evaluation approach taken by Anthropic, gauging a model’s accuracy in recalling information from a vast dataset. 

Effective processing of lengthy context prompts demands models with strong recall abilities. Claude 3 Opus not only achieved nearly perfect recall, surpassing 99% accuracy, but also demonstrated an awareness of evaluation limitations, identifying instances where the ‘needle’ sentence seemed artificially inserted into the original text by a human.

During an NIAH evaluation, which assesses a model’s recall ability by embedding a target sentence (“needle”) into a collection of random documents (“haystack”), Opus exhibited an unexpected behaviour. It used 30 random needle/question pairs per prompt to enhance the benchmark’s robustness and tested on a diverse corpus of crowdsourced documents. 

In a recount of internal testing on Claude 3 Opus, Alex Albert, prompt engineer at Anthropic, shared that during an NIAH evaluation of the model, it seemed to suspect that the team was running Eval on it. When presented with a question about pizza toppings, Opus produced an output that included a seemingly unrelated sentence from the documents. 

The context of this sentence appeared out of place compared to the overall document content, which primarily focused on programming languages, startups, and career-related topics. The suspicion arose that the pizza-topping information might have been inserted as a joke or a test to assess attention, as it did not align with the broader themes. The documents lacked any other information about pizza toppings.

So, Opus not only successfully identified the inserted needle but also demonstrated meta-awareness by recognising the needle’s incongruity within the haystack. This prompted reflection on the need for the industry to move beyond artificial tests.

Several users, who have tried Claude 3 Opus, are so impressed by its reasoning and understanding skills that they feel the model has reached AGI. For example, its apparent intrinsic worldview, shaped by the Integral Causality framework, is appreciated. Claude 3’s worldview is characterised by holism, development, embodiment, contextuality, perspectivism, and practical engagement. 

Other reactions from the community that discuss Claude 3’s potential status as AGI are its ability to reinvent quantum algorithms, its intrinsic worldview, and even its comprehension of a complex quantum physics paper. 

Another aspect highlighted by NVIDIA’s Jim Fan is the inclusion of domain expert benchmarks in finance, medicine, and philosophy, which sets Claude apart from models that rely solely on saturated metrics like MMLU and HumanEval. This approach provides a more targeted understanding of performance in specific expert domains, offering valuable insights for downstream applications. 

Secondly, Anthropic addresses the issue of overly cautious answers from LLMs with a refusal rate analysis. It emphasises efforts to mitigate overly safe responses to non-controversial questions.

However, it is also important to note that people should not overinterpret Claude-3’s perceived “awareness”. Fan believes that a simpler explanation is that instances of apparent self-awareness are outcomes of pattern-matching alignment data crafted by humans. This process is similar to asking GPT-4 about its self-consciousness, where a sophisticated response is likely shaped by human annotators adhering to their preferences. 

Even though the topic has been the talk of the town since OpenAI released GPT-4 in March 2023, Anthropic’s Claude 3 falls short. This raises an important question: How close are we to AGI? And, most importantly, who is leading that race?

Share
Picture of Shritama Saha

Shritama Saha

Shritama (she/her) is a technology journalist at AIM who is passionate to explore the influence of AI on different domains including fashion, healthcare and banks.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.