Stanford Brings Out BEHAVIOR Benchmark For 100 Everyday Household Tasks

BEHAVIOR is a benchmark for embodied AI with 100 everyday activities

A team of researchers from different disciplines at Stanford University has released BEHAVIOR (Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments), a benchmark for embodied AI with 100 everyday activities like washing dishes, picking up toys, cleaning floors, etc. in simulation. It has been the current version of BEHAVIOR available publicly at behavior.stanford.edu.

In creating this benchmark, the team led by leading computer scientist and Stanford Institute for Human-Centered AI co-director Fei-Fei Li and experts from computer science, psychology, and neuroscience, have established a “North Star”. It is a visual reference point to gauge the success of future AI solutions. It has usage potential to develop and train robotic assistants in virtual environments that are then shifted to operate in real ones. This paradigm is known as “sim to real.”

What is Embodied AI?

Scientists have always wanted to reach a stage in technological advancement where robots will help humans do daily (yet complex tasks). The researchers say that even when we reach that level of sophistication, for a robot to do these tasks, it must be able to perceive, reason, and operate with full awareness of its own physical dimension and capabilities and also the objects surrounding it. This combination of physical and situational awareness is called embodied AI.

As per the research titled, “BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments”, progress has been made to bring out embodied AI solutions. These include visual navigation, interactive Q&A, and instruction following, among others. But to develop artificial agents that can eventually perform and assist in daily tasks with human-level flexibility, a comprehensive benchmark is needed with more realistic, diverse, and complex activities.

Complex for Robots

Though on the surface, we might think, it is not complicated as these robots have to be trained just to do basic tasks which human beings can do very easily, in reality, this is not the case at all. It is indeed a complex phenomenon. 

The researchers give an example of cleaning a countertop.

  • The robot has to perceive and understand what a countertop is
  • Where to find it
  • Understand that it needs cleaning and assess counter’s physical dimensions
  • What products are best used to clean the countertop
  • How to coordinate its motions to get the countertop
  • The robot has to then determine the best course of action needed to clean the counter. While this might be a minor procedure for humans, for robots, it will be complex. It has to understand which materials are soakable and then declare whether a countertop is actually clean or not.

Although much progress has happened, the research says that three major issues have prevented existing benchmarks from filling the above three criteria. These are

  • Identifying and defining meaningful activities for benchmarking
  • Developing simulated environments that support such activities
  • Defining success and objective metrics to evaluate performance.

How is BEHAVIOR different?

The research says that BEHAVIOUR works on the three issues by:

  • Introducing BEHAVIOR Domain Definition Language (BDDL). It is a representation adapted from predicate logic that maps simulated states to semantic symbols. It allows the team to define 100 activities as initial and goal conditions. It then helps for the generation of potentially infinite initial states and solutions for achieving the goal states.
  • Help in its realization by listing environment-agnostic functional requirements for realistic simulation. 
  • The team provides a comprehensive set of metrics to evaluate agent performance in terms of success and efficiency. To make evaluation comparable across diverse activities, scenes, and instances, it proposes a set of metrics relative to demonstrated human performance on each activity and provide a large-scale dataset of 500 human demonstrations (758.5 min) in virtual reality, 

Future Moves

The research team aims to provide initial solutions to the benchmark with plans to extend it to presently not benchmarked tasks. It says that this will require contributions from diverse domains – robotics, computer vision, computer graphics, and cognitive science.

More Great AIM Stories

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

More Stories

OUR UPCOMING EVENTS

8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

MORE FROM AIM
Victor Dey
Google Releases Cloud TPU v4 Pods Benchmarks For Large Model Training

Google’s Open division submissions consist of a 480 billion parameter dense Transformer-based encoder-only benchmark using TensorFlow and a 200 billion-parameter JAX benchmark. These models are architecturally similar to MLPerf’s BERT model but with larger dimensions and number of layers.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM