MITB Banner

Top Used Datasets for Text to Image Synthesis Models

Abundant image datasets are one of the most crucial parts of training and testing computer vision based image synthesis.

Share

Listen to this story

Text-to-image models use computer vision algorithms to analyse images and understand, label, and interpret them. Image generation is likely the technology of the future and has already made several innovations and breakthroughs such as facial recognition and autonomous vehicles.

When it comes to training and testing these models, the datasets play a huge role for the comprehensiveness, accuracy, and variety of the generated images. Here’s a list of the most used datasets used by image synthesis models that you can implement for building your own models as well, just like the pros!

MS-COCO

Used by DALL-E for testing, MS-COCO is a large-scale object detection, captioning, and segmentation dataset that consists of 120,000 images in 91 different categories. Each image has five different captions which makes it an ideal dataset for testing image synthesis models.

Click here to go to the GitHub repository.

LAION-5B

An AI training dataset that contains more than five billion image-text pairs, LAOIN-5B builds by 14x on the predecessor LAOIN-400M. Large-scale AI Open Network (LAION) is one of the largest image-text dataset that is available free for everyone.

Click here for the dataset.

Conceptual Images 12m

CC12M is a dataset made of 12 million text-image pairs and is used by OpenAI’s DALL-E2 for training as one of the datasets. The dataset is built on their previous dataset of 3 million text-image pairs called CC3M and was used for various pre-training and end-to-end training of images.

Click here to check out the 2.5GB dataset.

Filtered YFCC100m

One of the biggest dataset for multimedia research, YFCC100M consists of 100 million objects with 99.2 million images and 0.8 million videos. The photos have a common creative license and identifying information about each image such as the Flickr identifier, owner name, and several other information of the images since the inception of Flickr in 2004 till 2014.

Click here for more information.

Imagenet

Google’s Language-Image Mixture of Experts (LIMoE) was trained on zero-shot learning with 5.6 billion parameters on ImageNet, which is a database organised according to the hierarchy of WordNet. Currently only including images of nouns, each node of the hierarchy depicts thousands of images.

Click here and visit the website.

Multi-Modal-CelebA-HQ

A large-scale face image dataset with text-guided image manipulation, for face generation and editing and VQA. The dataset has 30,000 total images with 24,000 for training and 6,000 for testing with ten captions per image, thereby making it a broad dataset.

Click here for the image dataset.

CelebA-Dialog

Another large scale, visual-language face dataset with rich fine-grained labels, classifying a single attribute into multiple degrees referring to its semantic meaning. The dataset has nearly 200,000 images with 10,000 identities containing five fine-grained information about each individual image.

Click here to download the dataset.

DeepFashion-MultiModal

Used for training and testing a lot of image synthesis models, DeepFashion is a rich multi-modal annotation with fine-grained labels and textual descriptions. The dataset consists of 800,000 diverse images of fashion that make for a large variety of images in different props in different poses.

Click here to visit their website.

MNIST Database

Yann LeCun’s proposed dataset with 60,000 training examples and testing set of 10,000 images. The dataset is mostly used for technique and pattern recognition on real-world data. The digits on the dataset are normalised and centred in an image of fixed size.

Visit the website to know more.

CompCars

This dataset contains 163 car makes and around 1,716 models annotated and labelled with five attributes each that include several information like speed, seats, and displacement.

Click here to access the database.

CIFAR-10

A larger dataset with 60,000 images of 32×32 resolution divided on the basis of colours into ten separate classes. The dataset is also divided into training batches with one test batch containing 10,000 images.

Click here to see the dataset.

Google’s Open Images

Featuring 9 million URLs, it is one of the largest datasets with millions of images with annotations. The dataset is divided into 6,000 categories, making it a widely used dataset for many prominent image generation models.

Click here to check out the description.

YouTube-8M

One of the larger datasets based on videos, Youtube-8M contains millions of labelled video IDs with annotations of 3,800 visual entities, excluding movies and TV series for copyright protection.

Check out the research here.

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.