Google’s DeepMind Sets A New Benchmark in Object Detection

The model is composed of both real-world videos with human annotations of point tracks, along with perfect ground-truth point tracks of synthetic videos.
Listen to this story

It is important for AI to perceive how objects move, gain physical understanding of rotation and shape change and particularly so in the context of  surveillance, self-driving vehicles, and more. Evaluating this further, researchers from Google-owned DeepMind in their paper, ‘TAP-Vid: A Benchmark for Tracking Any Point in a Video’, have introduced a new benchmark called ‘TAP-Vid’ to track points on physical surfaces on videos. 

The model is composed of both real-world videos with human annotations of point tracks, along with perfect, ground-truth point tracks of  synthetic videos. The team have proposed a simple end-to-end point tracking model, TAP-Net, outperforming all prior methods which were trained on synthetic data. 

Click here to view code and data. 


Sign up for your weekly dose of what's up in emerging technology.

Source: DeepMind

The researchers have introduced the problem of Tracking Any Point (TAP) in a given video, along with TAP-Vid dataset to bring in progress in the under-studied domain. 

Download our Mobile App

However, TAP still has limitations. The paper reads, “We cannot handle liquids or transparent objects, and for real data, annotators cannot be perfect, as they are limited to textured points and even then may make occasional errors due to carelessness.”

The team believes that the ethical concerns of the dataset are minimal. However, the real data comes from existing public sources which means that biases must be treated with care to ensure fairness of the final algorithm. The advancements in TAP will potentially bring solutions to many interesting challenges, such as better handling of dynamic or deformable objects in SFM [60] and allowing the semantic keypoint-based methods to be applied to generic objects. 

Other methods

Another interesting benchmark for tracking any object in the virtual world includes TAO, a large-scale benchmark for tracking any object, developed by researchers from Carnegie Mellon University,  Inria and Argo AI. They introduced a diverse dataset, similar to COCO, which consists of 2,907 high resolution videos, captured in diverse environments, and which are 30 seconds long on average. Besides TAO and COCO, other benchmarks include DAVIS, GOT-10K, YouTube BB, ScanNet, and others. 

However, in the case of DeepMind’s latest benchmark, the researchers have introduced the problem of tracking any point (TAP), alongside the TAP-Vid dataset, which has set a new standard in this under-studied domain. “By training on synthetic data, TAP-Net performs better on our benchmark than prior methods,” said the researchers. 

More Great AIM Stories

Bhuvana Kamath
I am fascinated by technology and AI’s implementation in today’s dynamic world. Being a technophile, I am keen on exploring the ever-evolving trends around applied science and innovation.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox