It isn’t enough for virtual assistants to offer a rote response to your voice or text. For AI-based systems to become truly useful in our daily lives, they’ll need to achieve what’s currently impossible — the complete comprehension of human language.
With this goal in mind, researchers Douwe Keila, Jason Weston, Harm de Vries, Kurt Shuster, Dhruv Batra and Devi Parikh, at Facebook’s AI research lab (FAIR) are teaching the artificial intelligence systems to understand language by getting them to ‘guide’ virtual tourists around New York City. They have developed a new research task, called Talk The Walk, which explores this embodied AI approach while introducing a degree of realism not previously found in this area.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
According to a paper published by FAIR, the process involves placing a “tourist bot” onto a random street corner of in New York and getting a “guide bot” to direct them to a spot on a 2D map.
The process goes something like this:
- A pair of AI agents has to communicate with each other to accomplish the shared goal of navigating to a specific location.
- The goal of the task is for the tourist bot to navigate its way through 360-degree images of actual New York City neighbourhoods
- This is done with the help of the guide bot who sees nothing but a map of the neighbourhood
- Using a novel attention mechanism called MASC (Masked Attention for Spatial Convolution), the researchers helped the guide bot focus on the right place on the map
- This, in turn, produced results that were more than twice as accurate on the test set
However, Keila and Weston added that Talk The Walk isn’t meant to be a competition between natural language and synthetic interactions. In fact, it is meant to be an attempt to offer clarity and quantifiable results related to the ultimate goal of creating machines that can effectively “talk” to humans as well as to each another.