MITB Banner

Hugging Face Introduces IDEFICS, Open GPT-4 Styled MultiModal 

It is based on Flamingo, a state-of-the-art visual language model initially developed by DeepMind

Share

Listen to this story

Hugging Face introduced IDEFICS (Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS), an open-access visual language model which accepts arbitrary sequences of images and texts and produces text.

IDEFICS, an 80 billion parameter multimodal model, is designed to process combinations of images and texts and generate coherent textual responses. Its capabilities include image-related inquiries, visual descriptions, and crafting narratives based on multiple images.

It is based on Flamingo, a state-of-the-art visual language model initially developed by DeepMind, which has not been released publicly. 

IDEFICS underwent training using a blend of openly accessible datasets, including Wikipedia, Public Multimodal Dataset, and LAION. Additionally, we introduced a novel dataset named OBELICS, comprising 141 million interwoven image-text documents sourced from the internet, encompassing a vast collection of 353 million images.

IDEFICS serves as an open-access counterpart to Flamingo, showcasing performance on par with the proprietary model across diverse image-text comprehension assessments and comes in two variants—the base version and the instructed version. Each variant is available in the 9-billion and 80-billion parameter sizes.

Interestingly, OpenAI hasn’t been able to make ChatGPT multimodal yet. Also as of now, the multimodal features of GPT-4 are not accessible in the APIs. OpenAI’s blog post mentions that users can currently make text-only requests to the GPT-4 model, and the capability to input images is still in a limited alpha stage. 

OpenAI introduced Code Interpreter in ChatGPT Plus. Many termed it as the GPT-4.5 moment but interestingly it was just old-school OCR from Python libraries and didn’t use multimodal for image generation. 

Apart from IDEFICS, as of now Bard and Bing also accept images as input and creates text. You can try IDEFICS here

Share
Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.