Active Hackathon

LAION Releases Large Scale OpenCLIP Models to Drive Image Classification Forward

The new H/14 model aims to achieve top level numbers with a wide application beyond image generation in high-end classification and dataset creation.
Listen to this story

In a blog post last week, LAION (Large-scale Artificial Intelligence Open Network) trained three large-scale CLIP models—ViT-L/14, ViT-H/14 and ViT-g/14—with OpenCLIP. The creation of this model is believed to have set a new benchmark for driving image classification and generation forward. 

CLIP models are typically trained in a self-supervised fashion on numerous (image, text) pairs. The blog says that with LAION, the team produced the ‘LAION-5B dataset’, which is believed to contain 5.8 billion closely related image and text pairs. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

CLIP (Contrastive Language – Image Pre-training) is a neural network which learns visual concepts from natural language supervision efficiently. It can be applied to any benchmarks in visual classification by providing the names of the categories to be recognised—similar to the “zero-shot” capabilities of GPT-2 and GPT-3.

The CLIP model ViT B/32 was initially released by OpenAI to filter the dataset out of common crawl. The team believes that the best open source CLIP model out of the LAION-5B dataset completes the open source replication of the CLIP paper, released by OpenAI in 2021.

The new H/14 model aims to achieve top level numbers with a wide application beyond image generation in high-end classification and dataset creation. The H/14 model achieves 78.0% zero shot top-1 accuracy on ImageNet and 73.4% on zero-shot image retrieval at Recall@5 on MS COCO—considered the best open source CLIP model as of September 2022.

The models are expected to be used for many applications such as clip guiding and conditioning, and claim to derive better results on models like stable diffusion. It can be further used for changing the text encoder to work in the multilingual setting or expanding to other modalities, and extracting the knowledge from smaller clips into a bigger one—to help bootstrap the learning processes. 

More Great AIM Stories

Bhuvana Kamath
I am fascinated by technology and AI’s implementation in today’s dynamic world. Being a technophile, I am keen on exploring the ever-evolving trends around applied science and innovation.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM