MITB Banner

10 Face Datasets To Start Facial Recognition Projects

One of the major research areas, facial recognition has been adopted by governments and organisations for a few years now. Leading phone makers like Apple, Samsung, among others, have been integrating this technology into their smartphones for providing maximum security to the users. As per research, facial recognition technology is expected to grow and reach $9.6 billion by 2020.

In this article, we list down 10 face datasets which can be used to start facial recognition projects.

(The datasets are listed according to the latest year of publication)

1| Flickr-Faces-HQ Dataset (FFHQ)

Flickr-Faces-HQ Dataset (FFHQ) is a dataset consist of human faces and includes more variation than CELEBA-HQ dataset in terms of age, ethnicity and image background, and also has much better coverage of accessories such as eyeglasses, sunglasses, hats, etc. The images were crawled from Flickr and then automatically aligned and cropped.  

Size: The dataset consists of 70,000 high-quality PNG images at 1024×1024 resolution and contains considerable variation in terms of age, ethnicity and image background. 

Projects: This dataset was originally created as a benchmark for generative adversarial networks (GAN).

Publication Year: 2019

Download here.

2| Tufts-Face-Database

Tufts Face Database is the most comprehensive, large-scale face dataset that contains 7 image modalities: visible, near-infrared, thermal, computerised sketch, LYTRO, recorded video, and 3D images. 

Size: The dataset contains over 10,000 images, where 74 females and 38 males from more than 15 countries with an age range between 4 to 70 years old are included. 

Projects: This database will be available to researchers worldwide in order to benchmark facial recognition algorithms for sketches, thermal, NIR, 3D face recognition and heterogamous face recognition.

Publication Year: 2019

Download here.

3| Real and Fake Face Detection

This dataset contains expert-generated high-quality photoshopped face images where the images are composite of different faces, separated by eyes, nose, mouth, or whole face.

Size: The size of the dataset is 215MB 

Projects: This dataset can be used to discriminate real and fake images.

Publication Year: 2019

Download here.

4| Google Facial Expression Comparison Dataset

This dataset by Google is a large-scale facial expression dataset that consists of face image triplets along with human annotations that specify, which two faces in each triplet form the most similar pair in terms of facial expression. 

Size: The size of the dataset is 200MB, which includes 500K triplets and 156K face images.

Projects: The dataset is intended to aid researchers working on topics related to facial expression analysis such as expression-based image retrieval, expression-based photo album summarisation, emotion classification, expression synthesis, etc.

Publication Year: 2018

Download here.

5| Face Images With Marked Landmark Points

Face Images with Marked Landmark Points is a Kaggle dataset to predict keypoint positions on face images.

Size: The size of the dataset is 497MP and contains 7049 facial images and up to 15 key points marked on them.

Projects: This dataset can be used as a building block in several applications, such as tracking faces in images and video, analysing facial expressions, detecting dysmorphic facial signs for medical diagnosis and biometrics or facial recognition.

Publication Year: 2018

Download here.

6| Labelled Faces in the Wild Home (LFW) Dataset

Labelled Faces in the Wild (LFW) dataset is a database of face photographs designed for studying the problem of unconstrained face recognition. Labelled Faces in the Wild is a public benchmark for face verification, also known as pair matching. 

Size: The size of the dataset is 173MB and it consists of over 13,000 images of faces collected from the web.

Projects: The dataset can be used for face verification and other forms of face recognition.

Publication Year: 2018

Download here.

7| UTKFace Large Scale Face Dataset

UTKFace dataset is a large-scale face dataset with long age span, which ranges from 0 to 116 years old. The images cover large variation in pose, facial expression, illumination, occlusion, resolution and other such.

Size: The dataset consists of over 20K images with annotations of age, gender and ethnicity.

Projects: The dataset can be used on a variety of task such as facial detection, age estimation, age progression, age regression, landmark localisation, etc. 

Publication Year: 2017

Download here.

8| YouTube Faces Dataset with Facial Keypoints

This dataset is a processed version of the YouTube Faces Dataset, that basically contained short videos of celebrities that are publicly available and were downloaded from YouTube. There are multiple videos of each celebrity (up to 6 videos per celebrity).

Size: The size of the dataset is 10GB, and it includes approximately 1293 videos with consecutive frames of up to 240 frames for each original video. The overall single image frames are a total of 155,560 images. 

Projects: This dataset can be used to recognising faces in unconstrained videos.

Publication Year: 2017

Download here.

9| Large-scale CelebFaces Attributes (CelebA) Dataset

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. 

Size: The size of the dataset is 200K, which includes 10,177 number of identities, 202,599 number of face images, and 5 landmark locations, 40 binary attributes annotations per image. 

Projects: The dataset can be employed as training and testing sets for the following computer vision tasks: face attribute recognition, face detection, landmark (or facial part) localisation, and face editing & synthesis. 

Publication Year: 2015

Download here.

10| Yale Face Database

The Yale Face Database contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: centre-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink.

Size: The size of the dataset is 6.4MB and contains 5760 single light source images of 10 subjects each seen under 576 viewing conditions.

Projects: The dataset can be used for facial recognition, doppelganger list comparison, etc.

Publication Year: 2001

Download here

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories