Fairness in AI models is a hot topic in computer vision, with researchers around the world invested in developing fair & inclusive AI models.
Sign up for your weekly dose of what's up in emerging technology.
“Top performing AI models trained on datasets that are created without considering fair distribution across subgroups and thus quite unbalanced, do not necessarily reflect the outcome in the real world. On the contrary, they may perform poorly and may be biased towards certain groups of people,” the researchers said.
The attributes of already existing facial datasets are either annotated by third parties or hand-labelled. Though the researchers have claimed the annotations are uniformly distributed over different attributes, such as age and gender, the accuracy of annotations is still suspect.
Deepfakes are software that uses a machine-learning algorithm to create a computer-generated version of the subject’s face. Deepfake detectors are able to differentiate between real and fake videos by accumulating classifier responses on individual frames. Although aggregating per-frame predictions removes most outliers for a robust and accurate classification, bias in face detectors may change the final result dramatically and cause deepfake detectors to fail. Are these detectors capable of detecting faces from various age groups, genders or skin tones? If so, how often do deepfake detectors fail based on this reason? Is there a way we can measure the vulnerabilities of deepfake detectors?
The researchers at Facebook AI proposed the dataset to address these questions.
Behind the dataset
Casual Conversations comprises video recordings of 3,011 individuals across genders, apparent skin types and age. It has approximately fifteen one minute video recordings for each of the 3,011 subjects. The dataset also includes a unique identifier and age, apparent skin type annotations, and gender for each subject.
The Casual Conversations dataset is unique since all the other publicly available datasets provide hand or machine labelled annotations and therefore introduce a bias towards a person’s appearance other than the actual age and gender. In this dataset, the age and gender annotations are provided by the subjects. The researchers called it one of the distinguishing features of this dataset.
“We prefer this human-centred approach and believe it allows our data to have a relatively unbiased view of age and gender,” the researchers said.
In addition to self-identified age and gender labels, as a third dimension in the dataset, the researchers annotated the apparent skin tone of each subject using the Fitzpatrick scale. Also, they labelled the videos recorded in low ambient lighting. This set of attributes allowed them to measure the robustness of the model on four dimensions, such as age, gender, apparent skin tone and ambient lighting.
The Casual Conversations dataset has uniform distributions across categories. It could measure various AI methods, such as face detection, apparent age and gender classification, or assess robustness against various ambient lighting conditions.
According to the researchers, though Casual Conversations is intended to evaluate the robustness of AI models across facial attributes, the dataset could come in handy for various open challenges.
The potential application areas include image inpainting, developing temporally consistent models, audio understanding, responsible AI on facial attribute classification and handling low-light scenarios in the problems mentioned above.
“Our dataset will be publicly available for general use and we encourage users to extend annotations of our dataset for various computer vision applications, in line with our data use agreement,” the researchers said.