MITB Banner

Facebook’s New Dataset To Make Facial Recognition Models More Inclusive

Facebook AI introduced a new dataset, Casual Conversations, to measure the robustness of AI models.

Share

Recently, researchers at Facebook AI introduced a new dataset, Casual Conversations, to measure the robustness of AI models across four main dimensions, age, gender, apparent skin type and lighting.

Fairness in AI models is a hot topic in computer vision, with researchers around the world invested in developing fair & inclusive AI models.

 “Top performing AI models trained on datasets that are created without considering fair distribution across subgroups and thus quite unbalanced, do not necessarily reflect the outcome in the real world. On the contrary, they may perform poorly and may be biased towards certain groups of people,” the researchers said.

The attributes of already existing facial datasets are either annotated by third parties or hand-labelled. Though the researchers have claimed the annotations are uniformly distributed over different attributes, such as age and gender, the accuracy of annotations is still suspect.

Deepfakes are software that uses a machine-learning algorithm to create a computer-generated version of the subject’s face. Deepfake detectors are able to differentiate between real and fake videos by accumulating classifier responses on individual frames. Although aggregating per-frame predictions removes most outliers for a robust and accurate classification, bias in face detectors may change the final result dramatically and cause deepfake detectors to fail. Are these  detectors capable of detecting faces from various age groups, genders or skin tones? If so, how often do deepfake detectors fail based on this reason? Is there a way we can measure the vulnerabilities of deepfake detectors?  

The researchers at Facebook AI proposed the dataset to address these questions.

Behind the dataset

Casual Conversations comprises video recordings of 3,011 individuals across genders, apparent skin types and age. It has approximately fifteen one minute video recordings for each of the 3,011 subjects. The dataset also includes a unique identifier and age, apparent skin type annotations, and gender for each subject.

The Casual Conversations dataset is unique since all the other publicly available datasets provide hand or machine labelled annotations and therefore introduce a bias towards a person’s appearance other than the actual age and gender. In this dataset, the age and gender annotations are provided by the subjects. The researchers called it one of the distinguishing features of this dataset.

“We prefer this human-centred approach and believe it allows our data to have a relatively unbiased view of age and gender,” the researchers said.

In addition to self-identified age and gender labels, as a third dimension in the dataset, the researchers annotated the apparent skin tone of each subject using the Fitzpatrick scale. Also, they labelled the videos recorded in low ambient lighting. This set of attributes allowed them to measure the robustness of the model on four dimensions, such as age, gender, apparent skin tone and ambient lighting.

Advantages

The Casual Conversations dataset has uniform distributions across categories. It could measure various AI methods, such as face detection, apparent age and gender classification, or assess robustness against various ambient lighting conditions.

According to the researchers, though Casual Conversations is intended to evaluate the robustness of AI models across facial attributes, the dataset could come in handy for various open challenges.

The potential application areas include image inpainting, developing temporally consistent models, audio understanding, responsible AI on facial attribute classification and handling low-light scenarios in the problems mentioned above.

Wrapping up

“Our dataset will be publicly available for general use and we encourage users to extend annotations of our dataset for various computer vision applications, in line with our data use agreement,” the researchers said.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.