Active Hackathon

TensorFlow Releases New 3D Pose Detection Model

Tensorflow has used a statistical 3D human body called GHUM, which is developed using a large corpus of human shapes and motions.
TensorFlow New Release

TensorFlow recently launched its first 3D model in TensorFlow.js pose detection API. The new model opens up doors to new design opportunities for applications such as fitness, medical motion capture, entertainment, etc. 

Here is an example of 3D motion capture, which drives an animated character in the browser. Click here to try it out.


Sign up for your weekly dose of what's up in emerging technology.
3D motion capture with BlazePose GHUM by Richard Yee (Source: Kalidoface 3D/TensorFlow)

Powered by MediaPipe and TensorFlow.js, this community demo uses multiple models, including FaceMesh, BlazePose, and HandPose. Try out the live demo here

Pose detection is one of the most critical steps in understanding the human body in videos and images. Previously, TensorFlow supported 2D pose estimation. The source code for pose detection is available on GitHub. Currently, it is available in three models, namely MoveNet, MediaPipe BlazePose, and PoseNet

MoveNet is an ultra-fast and accurate model that detects 17 keypoints of a body and can run on laptops and phones at 50+ fps. MediaPipe BlazePose can detect 33 keypoints, in addition to the 17 COCO keypoints, and provides additional keypoints for the face, hands, and feet. In PoseNet, each pose contains 17 keypoints, and can detect multiple poses.  

Also Read:

A Deep Dive into the 3D Pose Detection Model 

One of the critical challenges the researchers encountered while building the 3D part of their pose model was obtaining realistic, in-the-wild 3D data. In comparison, the 2D pose model is obtained via human annotation. 

However, obtaining accurate manual 3D annotation requires either a lab setup or specialised hardware with depth sensors for 3D scans, which introduces additional challenges to preserve a good level of human and ecological diversity in the dataset. 

Further, many researchers tend to use another alternative to build a completely synthetic dataset to address this challenge. That, again, leads to another challenge of domain adaptation to real-world pictures. 

TensorFlow has used a statistical 3D human body called GHUM, developed using a large corpus of human shapes and motions. “To obtain 3D human body pose ground truth, we fitted the GHUM model to our existing 2D pose dataset and extended it with real-world 3D keypoint coordinates in metric space,” said the TensorFlow team. 

During the fitting process, the team said that the shape and the pose variable of GHUM were optimised such that the reconstructed model aligns with the image evidence. It included 2D keypoint and silhouette semantic segmentation alignment, and shape and pose regularisation terms. 

Sample GHUM fitting for input image
Left to right: Original image, 3D GHUM reconstruction, and blended result projected on top of the original image. (Source: TensorFlow)

Here’s how it’s done 

Due to the nature of 3D to 2D projection, multiple points in 3D can have the same projection in 2D. Therefore, the fitting can result in several realistic 3D body poses for the given 2D annotation. To reduce this ambiguity, the annotators were asked to provide depth order between pose skeleton edges where they are certain, as shown in the image below. 

Depth order annotation: the wider edge corner denotes the corner closer to the camera
(e.g. the person’s right shoulder is closer to camera than left shoulder on both examples)
(Source: TensorFlow)

Compared to real depth annotation, this task proved to be an easy one for them. It showed high consistency between annotators and helped reduce the depth ordering errors for the fitted GHUM reconstructions from 25 per cent to 3 per cent. 

Thanks to BlazePose GHUM, it utilises a two-step detector-tracker approach where the tracker operates on a cropped human image. Hence, the model is trained to predict the 3D body pose in relative coordinates of a metric space with origin in the subject’s hips. 

MediaPipe vs TensorFlow.js 

When choosing MediaPipe versus TensorFlow.js, there are some pros and cons of using each runtime. According to the team, the MediaPipe runtime offers faster inference speed on desktop, laptop, and Android phones. TensorFlow.js, on the other hand, provides faster inference speed on iPhones and iPads. Tf.js runtime is also about 1 MB smaller than the MediaPipe runtime. 

(Source: TensorFlow)

The above table represents the performance of MediaPipe and TensorFlow.js runtime across different devices. The first number in each cell shows the lite model, followed by the second number for the full model and the third for the heavy model, respectively. 

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: Enabling a Data-Driven culture within BFSI GCCs in India

Data is the key element across all the three tenets of engineering brilliance, customer-centricity and talent strategy and engagement and will continue to help us deliver on our transformation agenda. Our data-driven culture fosters continuous performance improvement to create differentiated experiences and enable growth.

Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter