MITB Banner

Hugging Face Unveils Idefics2, an 8B Vision-Language Model

The new Idefics2 model outperforms larger rivals in the visual tasks.

Share

Listen to this story

Hugging Face has released Idefics2, an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs. The model can answer questions about images, describe visual content, create stories grounded on multiple images, or simply behave as a pure language model without visual inputs. 

Click here to check it out.

Idefics2 surges past its forerunner, Idefics1, boasting only 8 billion parameters and the flexibility granted by its open license (Apache 2.0), alongside significantly augmented Optical Character Recognition (OCR) capabilities.

In a remarkable feat of Idefics1, the new Idefics2 model outperformed larger rivals in the visual tasks, as the model has not only achieved exceptional performance on visual question answering benchmarks, but has also outperformed significantly larger language models like LLava-Next-34B and MM1-30B-chat.

Developed by the Hugging Face M4 team, the model is trained on a wide range of openly available datasets, including web documents, image-caption pairs, and OCR data. Additionally, the model was fine-tuned on a novel dataset called ‘The Cauldron,’ which amalgamated 50 carefully curated datasets for multifaceted conversational training. 

A significant architectural advancement in Idefics2 is the simplification of integrating visual features into the language backbone. The adoption of a Learned Perceiver Pooling and MLP modality projection has enhanced the model’s overall efficacy, marking a shift from its predecessor’s architecture.

Idefics2 exhibits a refined approach to image manipulation, maintaining native resolutions and aspect ratios, deviating from the conventional resizing norms in computer vision. 

Share
Picture of Gopika Raj

Gopika Raj

With a Master's degree in Journalism & Mass Communication, Gopika Raj infuses her technical writing with a distinctive flair. Intrigued by advancements in AI technology and its future prospects, her writing offers a fresh perspective in the tech domain, captivating readers along the way.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.