Guide to PyMAF: Pyramidal Mesh Alignment Feedback

PyMAF is a regression-based approach for human pose 3D mesh recovery. It introduces a new mesh alignment feedback loop that leverages different scales of spatial information obtained from a feature pyramid.

Generating 3D pose meshes from monocular images is a computer vision problem, aiming to automate a tedious and time-consuming aspect of Visual Effects. Modelling objects with long and complex kinematic chains, such as the human body, is labour intensive as the VFX artist has to go frame by frame to rotoscope different sections of the kinematic chain. 

Existing approaches for automating these tasks fall under two broad paradigms: optimization-based and regression-based. Optimization-based approaches directly fit the models to 2D data and produce accurate mesh-image alignments but are slow and sensitive to the initialization. Regression-based approaches directly map raw pixels to model parameters to create parametric models in a feed-forward manner via neural networks. 

These models are sensitive to minor deviations in parameters which often leads to misalignment between the generated meshes and the image evidence. In their paper, “3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop”, Hongwen Zhang, Yating Tian, et al. proposed a new feedback loop that utilizes a feature pyramid to rectify the parameters explicitly based on mesh-image alignment.


Sign up for your weekly dose of what's up in emerging technology.

Architecture & Approach

PyMAF architecture

Feature Pyramid for Human Mesh Regression 

The PyMAF image encoder produces a pyramid of spatial features that provide information of the human pose in the image at different scale levels. This allows the subsequent deep regressor to leverage multi-scale alignment contexts. The point-wise features extracted by the encoder then go through a multi-layer perceptron for dimensionality reduction and are concatenated together to form a feature vector.  The pose parameters are represented as relative rotations along kinetic chains and are thus sensitive to minor parameter errors. To deal with such misalignments, the parameter regressor uses 2D supervisions on the 2D key-points projected from the estimated mesh and additional 3D supervisions on 3D joints and model parameters when ground truth 3D labels are available. 

Mesh Alignment Feedback Loop

Regressing mesh parameters in a single pass is challenging; to overcome this limitation existing approaches have employed an Iterative Error Feedback (IEF) loop to update parameters iteratively. Although this approach reduces parameter errors, it uses the same global features each time for parameter update. These global features lack fine-grained information and are not responsive to new, more current predictions. PyMAF introduces a new Mesh Alignment Feedback (MAF) loop that leverages mesh-aligned features. In contrast to using uniformly sampled grid features or global features, the mesh-aligned features provide alignment details of the current estimation, which is more useful for parameter optimization. 

Auxiliary Pixel-wise Supervision

Spatial features can easily be affected by noise in images, as can be seen in the image above. To tackle noise caused by occlusion and illumination difference, PyMAF utilizes an auxiliary pixel-wise loss on the spatial features at the last level. This auxiliary supervision provides mesh-image association cues for the image encoder to preserve the most relevant information in the spatial feature maps. 

Creating Human Pose Meshes From Monocular Images Using PyMAF

The following code has been taken from the official demo Colab notebook available here.

  1. Clone the PyMAF GitHub repository and navigate into the master directory.
 !git clone
 !cd PyMAF 
  1. Install PyTorch, Torchvision and other requirements.
 !pip3 install -U
 !pip3 install -U
 !pip install -r requirements.txt 
  1. Run the script to generate the 3D mesh for your video; make sure to replace ./sample_video.mp4 with the path to your video file.
!CUDA_VISIBLE_DEVICES=0 python3 --checkpoint=data/pretrained_model/ --vid_file ./sample_video.mp4
3D human pose mesh created by PyMAF

Last Epoch

PyMAF has improved mesh-image alignment

This article went through PyMAF, a regression-based approach for human pose 3D mesh recovery. It introduced a new mesh alignment feedback loop that leverages different scales of spatial information obtained from a feature pyramid. Model parameters are optimized by the feedback loop based on the alignment status of the currently estimated meshes. In addition to that, an auxiliary supervision task is imposed on the spatial feature maps during the training of the regressor. This pixel-wise supervision makes the regressors less susceptible to noise in the images and improves the reliability of the mesh-aligned features. PyMAF was evaluated on both indoor and in-the-wild datasets, and it consistently improved the mesh image alignment performance over previous regression-based methods. 


All images, except the output, has been taken from the PyMAF paper.

More Great AIM Stories

Aditya Singh
A machine learning enthusiast with a knack for finding patterns. In my free time, I like to delve into the world of non-fiction books and video essays.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM