MITB Banner

IBM, MIT & Harvard Release Dataset & ML Models For Common Sense

Our work is directed to bridge this gap by proposing a dataset that probes core psychological reasoning concepts.

Share

IBM, MIT and Harvard have released the DARPA “Common Sense AI” dataset at the ongoing 38th International Conference on Machine Learning (ICML).

The researchers have released AGENT (Action, Goal, Efficiency, coNstraint, uTility), a benchmark for core psychology reasoning consisting of a large dataset (8,400 3D animations) and two machine learning models – BIPaCK and ToMnet-G. The research was aimed at accelerating the development of AI that manifests common sense.

Commonsense reasoning–the ability to make acceptable and logical assumptions in our daily life–has long been a bottleneck in artificial intelligence and natural language processing.

“Today’s machine learning models can have superhuman performance. It is still unclear if they understand basic principles that drive human reasoning. For machines to successfully be able to have social interaction like humans do among themselves, they need to develop the ability to understand hidden mental states of humans,” said Abhishek Bhandwaldar, Research Engineer, MIT-IBM AI Lab.

“Our work is directed to bridge this gap by proposing a dataset that probes core psychological reasoning concepts. Our dataset is a collection of videos that are similar to the developmental studies but generated at a much larger scale with visual differences. We have also proposed two different machine learning approaches to solve the dataset,” he added.

Research

The research aims to build a machine learning model with the same level of common sense as a young child.

Intuitive psychology is the ability of people to understand and reason about other people’s state of mind. This ability helps us have meaningful social interactions. ML algorithms lack this power of perception and require huge amounts of data to train AI models. 

The researchers presented a benchmark consisting of a large dataset of procedurally generated 3D animations, AGENT (Action, Goal, Efficiency, coNstraint, uTility), structured around four scenarios to probe key concepts of core intuitive psychology: 

  • Goal preferences
  • Action efficiency
  • Unobserved constraints
  • Cost-reward trade-offs

The figure below summarises the design of trials in AGENT, which groups trials into four scenarios. All trials have two phases:

  • A familiarisation phase showing one or multiple videos of the typical behaviors of a particular agent, and
  • A test phase showing a single video of the same agent either in a new physical situation (the Goal Preference, Action Efficiency and Cost-Reward Trade-offs scenarios) or the same video as familiarisation but revealing a portion of the scene previously occluded (Unobserved Constraints).

Considering the data structure, there are 8,400 videos in AGENT. Each video lasts from 5.6 s to 25.2 s, with a frame rate of 35 fps. “With these videos, we constructed 3360 trials in total, divided into 1920 training trials, 480 validation trials, and 960 testing trials (or 480 pairs of expected and surprising testing trials, where each pair shares the same familiarization video(s)). All training and validation trials only contain expected test videos,” the researchers said.

The two machine learning approaches introduced at ICML advance real-world training of AI and machine learning models using traditional human psychology methods. The researchers compared two strong baselines built on Bayesian inverse planning and a Theory of Mind neural network.

For the proposed tasks in the benchmark, researchers built two baseline models – BIPaCK and ToMnet-G – based on existing approaches, and compared their performance on AGENT to human performance. “Overall, we find that BIPaCK achieves a better performance than ToMnet-G, especially in tests of strong generalization,” reads the paper.

This work was supported by the DARPA Machine Common Sense program, MIT-IBM AI LAB, and NSF STC award CCF-1231216.

Wrapping up

In a paper titled ‘CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning’, researchers presented a constrained text generation task, COMMONGEN associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning. 

“Our extensive experiments systematically examine recent pre-trained language generation models (e.g., UniLM, BART, T5) on the task , and find that their performance is still far from humans, generating grammatically sound yet realistically implausible sentences,” concluded the research.

Share
Picture of kumar Gandharv

kumar Gandharv

Kumar Gandharv, PGD in English Journalism (IIMC, Delhi), is setting out on a journey as a tech Journalist at AIM. A keen observer of National and IR-related news.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.