Active Hackathon

IBM, MIT & Harvard Release Dataset & ML Models For Common Sense

Our work is directed to bridge this gap by proposing a dataset that probes core psychological reasoning concepts.

IBM, MIT and Harvard have released the DARPA “Common Sense AI” dataset at the ongoing 38th International Conference on Machine Learning (ICML).

The researchers have released AGENT (Action, Goal, Efficiency, coNstraint, uTility), a benchmark for core psychology reasoning consisting of a large dataset (8,400 3D animations) and two machine learning models – BIPaCK and ToMnet-G. The research was aimed at accelerating the development of AI that manifests common sense.


Sign up for your weekly dose of what's up in emerging technology.

Commonsense reasoning–the ability to make acceptable and logical assumptions in our daily life–has long been a bottleneck in artificial intelligence and natural language processing.

“Today’s machine learning models can have superhuman performance. It is still unclear if they understand basic principles that drive human reasoning. For machines to successfully be able to have social interaction like humans do among themselves, they need to develop the ability to understand hidden mental states of humans,” said Abhishek Bhandwaldar, Research Engineer, MIT-IBM AI Lab.

“Our work is directed to bridge this gap by proposing a dataset that probes core psychological reasoning concepts. Our dataset is a collection of videos that are similar to the developmental studies but generated at a much larger scale with visual differences. We have also proposed two different machine learning approaches to solve the dataset,” he added.


The research aims to build a machine learning model with the same level of common sense as a young child.

Intuitive psychology is the ability of people to understand and reason about other people’s state of mind. This ability helps us have meaningful social interactions. ML algorithms lack this power of perception and require huge amounts of data to train AI models. 

The researchers presented a benchmark consisting of a large dataset of procedurally generated 3D animations, AGENT (Action, Goal, Efficiency, coNstraint, uTility), structured around four scenarios to probe key concepts of core intuitive psychology: 

  • Goal preferences
  • Action efficiency
  • Unobserved constraints
  • Cost-reward trade-offs

The figure below summarises the design of trials in AGENT, which groups trials into four scenarios. All trials have two phases:

  • A familiarisation phase showing one or multiple videos of the typical behaviors of a particular agent, and
  • A test phase showing a single video of the same agent either in a new physical situation (the Goal Preference, Action Efficiency and Cost-Reward Trade-offs scenarios) or the same video as familiarisation but revealing a portion of the scene previously occluded (Unobserved Constraints).

Considering the data structure, there are 8,400 videos in AGENT. Each video lasts from 5.6 s to 25.2 s, with a frame rate of 35 fps. “With these videos, we constructed 3360 trials in total, divided into 1920 training trials, 480 validation trials, and 960 testing trials (or 480 pairs of expected and surprising testing trials, where each pair shares the same familiarization video(s)). All training and validation trials only contain expected test videos,” the researchers said.

The two machine learning approaches introduced at ICML advance real-world training of AI and machine learning models using traditional human psychology methods. The researchers compared two strong baselines built on Bayesian inverse planning and a Theory of Mind neural network.

For the proposed tasks in the benchmark, researchers built two baseline models – BIPaCK and ToMnet-G – based on existing approaches, and compared their performance on AGENT to human performance. “Overall, we find that BIPaCK achieves a better performance than ToMnet-G, especially in tests of strong generalization,” reads the paper.

This work was supported by the DARPA Machine Common Sense program, MIT-IBM AI LAB, and NSF STC award CCF-1231216.

Wrapping up

In a paper titled ‘CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning’, researchers presented a constrained text generation task, COMMONGEN associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning. 

“Our extensive experiments systematically examine recent pre-trained language generation models (e.g., UniLM, BART, T5) on the task , and find that their performance is still far from humans, generating grammatically sound yet realistically implausible sentences,” concluded the research.

More Great AIM Stories

kumar Gandharv
Kumar Gandharv, PGD in English Journalism (IIMC, Delhi), is setting out on a journey as a tech Journalist at AIM. A keen observer of National and IR-related news.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022