MITB Banner

Meta Open Sources Casual Conversations v2, An Inclusive Dataset for Computer Vision

The categories include diverse ages, genders, language/dialects, geographies, disabilities, physical adornments, physical attributes, voice timbres, skin tones, activities, and recording setups.
Share
Meta Open Sources Casual Conversations v2, An Inclusive Dataset for Computer Vision
Listen to this story

Amid rising concerns about privacy and ethics behind datasets used in AI models, Meta has open-sourced a consent-driven dataset of recorded monologues called ‘Casual Conversations v2‘, calling it an inclusive dataset. This second version of the dataset has been created to assist researchers in evaluating the accuracy of their computer vision, speech, and audio models across a wide range of use cases. 

This new dataset features a more comprehensive list of ten annotated categories, self-provided by the participants for better measurement of algorithmic fairness and robustness in AI systems. These categories include diverse ages, genders, language/dialects, geographies, disabilities, physical adornments, physical attributes, voice timbres, skin tones, activities, and recording setups.

Click here to access the dataset.

The dataset is composed of 26,467 videos of around 5,567 participants and intended for assessing the performance of pre-trained models in computer vision and audio applications, according to the company’s data license agreement. The data is available in mp4 format with an average video length of one minute.

Interestingly, the participants were labelled on their apparent skin tone using Fitzpatrick Scale and Monk Scale along with annotations of Voice timbre, Activity, and Recording setups. The words are derived from a sample paragraph from Fyodor Dostoevsky’s ‘The Idiot’ or non-scripted answering one of five predetermined questions.

Meta said that the creation of Casual Conversations v2 was part of a continuous effort to promote inclusive AI and products across different industries. While addressing issues of AI fairness and robustness requires collaboration and multiple solutions, the team is committed to partnering with field experts to explore these areas further and inspire more research.

PS: The story was written using a keyboard.
Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India