Last updated December 7, 2021
In AI Origins & Evolution

Google Releases New Dataset For Advanced 3D Object Understanding

Published on November 14, 2020
by Shraddha Goled

Machine learning models for computer vision tasks have been largely trained on photos. However, there lies a large possibility for scaling up to a wider range of applications such as augmented reality, autonomy, robotics, and image retrieval tasks if we train these models on 3D objects. To achieve this has been an uphill task since there is a dearth of large real-world datasets of objects in 3D, as compared to 2D datasets such as ImageNet, COCO, and Open Images.

Now, Google has released the Objectron dataset, which is a collection of short, object-centric video clips that capture a large set of common objects from various angles. Along with the dataset, the research also details a new 3D object detection solution.

Objectron Dataset

The Objectron dataset contains video clips and images in the following categories: bikes, books, cameras, bottles, chairs, cups, shoes, and laptops. It consists of 15,000 annotated video clips and 4 million images collected from a geo-diverse sample covering ten countries in five continents. The data contains:

Manually annotated bounding boxes, which describe the position, orientation, and dimension, of each object
The video clips contain augmented reality metadata such as camera poses and sparse point clouds.
A shuffled version of annotated frames called the processed dataset for images and SequenceExample format for videos
Scripts to run evaluation
Supporting scripts to load the data into deep learning libraries such as Tensorflow, PyTorch, and Jax to visualise the dataset.

3D Object Detection Solution

Google’s team also released a 3D object detection solution for four categories of objects — shoes, chairs, mugs, and cameras. These models are trained using the Objectron dataset. The models are released in MediaPipe. MediaPipe is an open-source cross-platform that offers customisable machine learning solutions for live and streaming media; it finds applications in human pose detection and tracking, hand tracking, iris tracking, 3D object detection, and face detection.

In the single-stage model that was proposed earlier this year, the pose and physical size of an object were determined using a single RGB image. Some of the features of this model include:

An encoder-decoder architecture built on MobileNetv2
Prediction of object shape using detection and regression
Using a well-established pose estimation algorithm for obtaining the final 3D coordinates for the bounding box
The model is lightweight and can run real-time on mobile devices

The new 3D object detection model, however, utilises a two-stage architecture, a marked improvement from its predecessor, mentioned above, that used a single-stage model. The first stage in this model uses the TensorFlow Object Detection model to find the 2D crop of the object. The 2D crop is used to determine the 3D bounding box in the second stage. To avoid operating the object detection for every frame, the model also simultaneously determines the 2D crop for the next frame.

3D Object Detection Solution Architecture

To evaluate the performance of the 3D detection models, a 3D intersection over union (IoU) similarity statistics was used. Also called the Jaccard index, IoU is used for gauging the similarity and diversity of the sample sets; it is commonly used for computer vision tasks that measure how close bounding boxes are to the ground truth.

Google has proposed an algorithm for estimating accurate 3D IoU values for the bounding boxes. Using the Sutherland-Hodgman Polygon clipping algorithm, the team first computed the intersection points between the faces of the two boxes and further, the volume of this intersection is calculated. Finally, this calculated volume of the intersection and the union of two boxes is used for computing the IoU.

Computing the IoU

With open-sourcing the dataset and introducing the two-stage object detection model, Google hopes to enable wider research in the fields of view synthesis, unsupervised learning, and improved 3D representation.

Read the full blog here. Find the link to the open-sourced dataset here.

Access all our open Survey & Awards Nomination forms in one place >>

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Watch More

Google Releases New Dataset For Advanced 3D Object Understanding

Objectron Dataset

3D Object Detection Solution

Shraddha Goled

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.