Facebook Releases Massive Annotated Dataset Of First-Person Perception

Ego4D is an egocentric video dataset and benchmark suite captured across 74 locations in nine countries.
Facebook Ego4D

We humans can easily recognise and identify objects. For example, we can differentiate between a cat and a dog and a house from a tree. However, machines have to be trained for performing a basic task like that. 

Large data sets like ImageNet, MNIST and CIFAR are used for image classification across research, hackathons and industries. Yet, despite the volume of these datasets, machines cannot perceive the world in a way that we humans do– capturing details of everyday life. Enabling machines to experience life as humans do will be a huge milestone in the field of computer vision, and Facebook AI is leading the way to take that leap. 

Earlier last week, Facebook AI introduced Ego4D— an egocentric video dataset and benchmark suite. Captured by 855 camera wearers across 74 locations in nine different countries, Ego4D offers 3,025 hours of daily life activity videos spanning household, outdoor, leisure, and workplace settings. Facebook AI Research has worked with 13 universities, bringing together 88 researchers across the globe to assemble what it claims to be the largest ever data set of first-person video (more than 20x greater than any other data set in terms of hours of footage). 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

The potential 

The main motive behind Ego4D is to provide a diverse volume of egocentric publicly available video footage to the research community. Parts of the videos are accompanied by audio, eye gaze, stereo, 3D meshes of the environment and synchronised videos from multiple egocentric cameras simultaneously. 

The dataset can be used for deep learning image recognition models. That is, to build artificial intelligence (AI) that can understand activities as perceived by humans. Facebook AI wants to allow the use of the model for first-person videos, like how ImageNet is used for photos. Kristen Grauman, the Lead Research Scientist at Facebook, has said that the next-generation AI systems will be able to learn from videos that show the world from the centre of the action rather than the sidelines. 

Download our Mobile App

The five benchmarks laid down by Facebook AI for Ego4D include: 

  • Episodic memory: for what happened when (this could be used to find one’s belongings, e.g., keys.)
  • Forecasting: for what to do next (AI to guide people through a recipe) 
  • Hand and object manipulation: for what to do (AI teaching a person to play the drums) 
  • Audio-visual diarisation: to know who said what and when (AI to provide a summary of a talk or session) 
  • Social interaction: to know who’s interacting with whom. 

AIs trained on the Ego4D dataset will better interpret images from smart glasses and control robots that can interact with humans. Additionally, it will help develop smarter AI assistants that can interact in the metaverse.   

Ego4D will make it possible for AI to better understand the world around it and make AI personalised at an individual level. Facebook AI is actively working in assistant-inspired research prototypes to enable personalisation.  

The flip side

While Facebook claims that the collection approach for these videos “uphold privacy and ethics standards with consenting participants and robust de-identification procedures where relevant,” we are all aware of accusations against the tech giant. Facebook is known for prioritising profits over people’s well-being. The potential misuses of Ego4D are many and prominent. 

As a social media tech giant, Facebook wants to collect as much user data as possible to sell it to advertisers. With Ego4D enabling AI to constantly touch with humans, even when they are offline, the tech giant will be able to collect humongous amounts of data of each particular user– what’s around the house, hobbies and activities one enjoys, etc. Kristen believes that more work is remaining on the privacy issue. But will the benefits outweigh the privacy concerns? Only time will tell. 

Ego4D will be publicly available for researchers from November this year. 

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Debolina Biswas
After diving deep into the Indian startup ecosystem, Debolina is now a Technology Journalist. When not writing, she is found reading or playing with paint brushes and palette knives. She can be reached at debolina.biswas@analyticsindiamag.com

Our Upcoming Events

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

The Great Indian IT Reshuffling

While both the top guns of TCS and Tech Mahindra are reflecting rather positive signs to the media, the reason behind the resignations is far more grave.

OpenAI, a Data Scavenging Company for Microsoft

While it might be true that the investment was for furthering AI research, this partnership is also providing Microsoft with one of the greatest assets of this digital age, data​​, and—perhaps to make it worse—that data might be yours.