Facebook’s New Dataset To Make Facial Recognition Models More Inclusive

Facebook AI introduced a new dataset, Casual Conversations, to measure the robustness of AI models.

Recently, researchers at Facebook AI introduced a new dataset, Casual Conversations, to measure the robustness of AI models across four main dimensions, age, gender, apparent skin type and lighting.

Fairness in AI models is a hot topic in computer vision, with researchers around the world invested in developing fair & inclusive AI models.

 “Top performing AI models trained on datasets that are created without considering fair distribution across subgroups and thus quite unbalanced, do not necessarily reflect the outcome in the real world. On the contrary, they may perform poorly and may be biased towards certain groups of people,” the researchers said.

The attributes of already existing facial datasets are either annotated by third parties or hand-labelled. Though the researchers have claimed the annotations are uniformly distributed over different attributes, such as age and gender, the accuracy of annotations is still suspect.

Deepfakes are software that uses a machine-learning algorithm to create a computer-generated version of the subject’s face. Deepfake detectors are able to differentiate between real and fake videos by accumulating classifier responses on individual frames. Although aggregating per-frame predictions removes most outliers for a robust and accurate classification, bias in face detectors may change the final result dramatically and cause deepfake detectors to fail. Are these  detectors capable of detecting faces from various age groups, genders or skin tones? If so, how often do deepfake detectors fail based on this reason? Is there a way we can measure the vulnerabilities of deepfake detectors?  

The researchers at Facebook AI proposed the dataset to address these questions.

Behind the dataset

Casual Conversations comprises video recordings of 3,011 individuals across genders, apparent skin types and age. It has approximately fifteen one minute video recordings for each of the 3,011 subjects. The dataset also includes a unique identifier and age, apparent skin type annotations, and gender for each subject.

The Casual Conversations dataset is unique since all the other publicly available datasets provide hand or machine labelled annotations and therefore introduce a bias towards a person’s appearance other than the actual age and gender. In this dataset, the age and gender annotations are provided by the subjects. The researchers called it one of the distinguishing features of this dataset.

“We prefer this human-centred approach and believe it allows our data to have a relatively unbiased view of age and gender,” the researchers said.

In addition to self-identified age and gender labels, as a third dimension in the dataset, the researchers annotated the apparent skin tone of each subject using the Fitzpatrick scale. Also, they labelled the videos recorded in low ambient lighting. This set of attributes allowed them to measure the robustness of the model on four dimensions, such as age, gender, apparent skin tone and ambient lighting.


The Casual Conversations dataset has uniform distributions across categories. It could measure various AI methods, such as face detection, apparent age and gender classification, or assess robustness against various ambient lighting conditions.

According to the researchers, though Casual Conversations is intended to evaluate the robustness of AI models across facial attributes, the dataset could come in handy for various open challenges.

The potential application areas include image inpainting, developing temporally consistent models, audio understanding, responsible AI on facial attribute classification and handling low-light scenarios in the problems mentioned above.

Wrapping up

“Our dataset will be publicly available for general use and we encourage users to extend annotations of our dataset for various computer vision applications, in line with our data use agreement,” the researchers said.

Download our Mobile App

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox