Facebook AI has announced its ambitious long-term project, Ego4D, to solve challenges in egocentric perception – the ability of AI to understand and interact with the world in a similar fashion as we humans do, i.e., from a first-person perspective. To further simplify the understanding, in order to train and teach AI, take, for example, the computer vision system is fed with millions of photos and videos captured by a third person. However, the next-gen AI systems need data that shows the world from the first-person perspective.
Further, the Facebook AI team has collaborated with 13 universities and labs across nine countries, including India. International Institute of Information Technology (IIIT), Hyderabad, is the only university from India to team up for the Ego4D project. Founded in 1998, IIIT-H has evolved strong research programmes over the years in several areas, with a strong emphasis on science, technology and applied research for both industry and society.
This consortium of universities and labs collected more than 2,200 hours of first-person video in the wild, with over 700 participants going about their daily lives. This cooperation dramatically scales up the amount of egocentric data publicly available to the research community, that too by orders of magnitude more than 20 times greater than any other data set in terms of hours of footage. Facebook supported and funded the project through academic gifts to each of the participating labs and universities.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Image: Facebook AI
The Project
Facebook AI also created five benchmark challenges based on first-person visual experience, which will help future AI assistants progress toward real-world applications. These include:
Download our Mobile App
- Episodic memory: What happened when? (For example, “Where did I leave my purse?”)
- Forecasting: What am I likely to do next? (e.g., “You have to add two spoons of sugar now.”)
- Hand and object manipulation: What am I doing? (e.g., “Let me know how to play the guitar.”)
- Audio-visual diarization: Who said what when? (e.g., “What was the time to reach the cinema?”)
- Social interaction: Who is interacting with whom? (e.g., “Help me better hear the person talking to me at this noisy restaurant.”)
Ego4D is aimed at addressing issues in embodied AI, a discipline that aims to develop AI systems with a physical or virtual embodiment, such as robots. Embodied AI is based on the theory of embodied cognition, which states that many parts of psychology, human or another, are shaped by aspects of an organism’s entire body. Researchers intend to increase the performance of AI systems such as chatbots, autonomous vehicles, robots, and even smart glasses that have to interact continuously with their environments, humans, and other AI by applying this logic to AI.
Facebook distributed head-mounted cameras and wearable sensors to the participants so that they could capture first-person, unscripted videos of their daily lives. The research participants capture video of their day to day routines such as cooking, grocery shopping, talking while playing games and engaging in activities with family and friends. Hence, everything was captured from the centre of action rather than someone shooting the video or capturing the photo from the sidelines.
Moreover, the Facebook AI team said it will make this data publicly available in November 2021.
In addition to the same, researchers from Facebook Reality Labs employed Vuzix Blade Smart Glasses to further produce an additional 400 hours of first-person video data in staged situations in their own study labs. This information will also be made public.
Wrapping up
It’s critical to understand that poor representation in computer vision datasets can be harmful, especially as the AI industry lacks unambiguous explanations of bias. Consider, for example — ImageNet and OpenImages — two big, publicly available datasets, have previously been discovered to be US and Euro-centric, embodying humanlike biases regarding race, gender, colour, ethnicity, weight, and others. The datasets from the Ego4D project can ward off these concerns up to a good extent.
Additionally, we can hope that it will be possible for assistants to deliver value in unique and meaningful ways using AI-driven capabilities enabled by Ego4D’s benchmarks and trained on the data set.