“A grand challenge in AI is to build machines that can take actions in the world. I think anticipation will be a crucial ingredient for machine action. To plan, robots will need to understand what outcome their actions will have, and anticipate the future state of the world,” said Carl Vondrick, assistant professor of computer science at Columbia University in an earlier interview (he was a researcher at MIT then). In the same interview, Vondrick said that computer vision would become the main interface for artificial agents to understand the world in the future.
Cut to 2021, Vondrick has now directed a study, which demonstrated at the recently concluded CVPR 2021 event, that teaches AI how to predict human behaviour from videos.
Under the guidance of Vondrick, a research team consisting of two PhD students Didar Surius and Ruoshi Liu used computer vision technology for giving machines a more intuitive sense of behaviour exhibited by humans and animals by leveraging high-level relationships.
The algorithm was trained on hundreds of videos from cinema and television series, including the famous sitcom “The Office”. Authors claim that they were able to predict everyday human behavioural activities like fist-bumping. When the system cannot predict a specific action, it correlates with a higher-level concept (for example–using the word greeting for fist-bumping).
Predicting human behaviour
Previous work in predictive machine learning has mainly focused on predicting just one action at a time; for example, such algorithms could only predict whether the action is a hug, high five or a handshake. Unfortunately, this approach fails as most machine learning models are unable to find the commonalities between possible outcomes.
Surius and Liu decided to look at the longer-range prediction problem from a different perspective. Co-author Sirius said their model tries to mimic human behaviour of using higher level of abstraction to attempt predicting the future or foreseeing what will happen next.
During the algorithm’s training, the research duo visited mathematical questions that can be traced back to ancient Greek. They observed that while machine learning models obey basic rules of geometry, they are often faced with bizarre and counter-intuitive properties. So the authors decided to use those unusual geometries to build AI models that could predict human behaviour in future by using high-level concepts.
The researchers used computer vision along with artificial intelligence algorithms to learn and recognise behaviour patterns. For example, the algorithm analyses body and facial movements to understand the emotional states of an individual.
The mathematical framework developed by the researchers enables machines to organise events by how predictable they are in the future. This will allow machines to act in a specific manner when there is a certainty and make generic predictions in case of uncertainties. The results can open several possibilities for human-robot collaboration, autonomous vehicles, and assistive technology.
This study aims to enable computers to make an independent and nuanced decision instead of depending on a pre-programmed action. Furthermore, if machines can understand and anticipate human behaviours, they could be easily utilised for assistance in daily activities.