Last updated January 13, 2021
In AI Mysteries

Hands-on Guide to Impersonator++: Motion Imitation Library

Share

Published on January 12, 2021

by Mohit Maithani

Impersonator++ is a human motion imitation library with state-of-the-art image synthesis within a unified framework, which means if the model once trained it can be used to handle all these tasks. Previous methods use 2D human pose key points to estimate the body structure, but Impersonator++ uses a 3D body Mesh recovery module to extricate the shape and pose of humans, which can further model the joint location and rotation and can characterize the personalized body shape.

Impersonator++ research paper: Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer, and Novel View Synthesis, is been published by the researchers of ShanghaiTech University: Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, and Shenghua Gao on Oct 2019. For preserving the data of texture, style, color, and face identity, they purpose a Liquid Warping GAN with Liquid Warping Block(LWB) that propagates those data in both image and feature spaces. And on output, it synthesizes an image with respect to its reference. Also, the researchers build a new dataset namely Impersonator(iPER) dataset, for accurately testing human motion imitation and image synthesis.

Let’s see some of the Image Synthesis techniques for different applications, with the integrated use of a source image and Reference image, before going on to the implementation.

a.) Human Motion Imitation

Motion Imitation is used to generate an image with texture from source human and pose from reference human. Simply It imitates the pose from the image and integrates a synthesized result.

b.) Novel View Synthesis

Human Novel View Synthesis is all about synthesizing a new image of the human body, captured from different viewpoints as shown in the above picture.

c.) Human Appearance Transfer

Appearance transfer as the above image demonstration is making it clear, is used to generate a human image by preserving reference identity with clothes. Now the different parts of the image may come from different people.

Liquid Warping Block(LWB)

With the recent advancement in GAN technology and many flaws in previous methods like concatenation and texture warp, Imaginaire++ purposed a Liquid Warping Block(lwb) method to preserve the source information like clothes and face identity, it addresses 3 main improvements:

A denoising convolutional auto-encoder for preserving information.
LWB takes features of each local part and blended them into a global feature stream to preserve the source details.
LWB supports multiple-source warping, like as in appearance transfer, warping the features of the head from one source and the body from another, and then aggregating into a global feature stream.

Liquid Warping GAN

Liquid Warping GAN contains three stages:

Body Mesh recovery module
Flow composition module
GAN module with Liquid Warping Block(LWB)

Body Mesh recovery module
Flow composition module
GAN module with Liquid Warping Block(LWB)

These stages synthesize high-fidelity human images under the desired condition. More specifically, we can it does these three tasks:

1) It synthesizes the background image of the object;

2) It predicts the color of invisible parts based on the visible parts of the image;

3) Also it generates pixels of faces, clothes, hairs, and others out of the reconstruction of SMPL(a parametric statistical human body model)

Impersonator (iPER) dataset

Now at last to really reproduce the result of the purpose methods by an impersonator++, researchers introduce the Impersonator (iPER) dataset, which contains 30 different humans of different shapes, sizes, sex, and height. Everyone is wearing different clothes and performing a pose-video by doing random actions like exercising, jumping, squat, leg-raising, and Tai Chi.

Some of the other feature if the iPER dataset is:

There are 103 clothes in total as some might wear multiple clothes.
It contains 206 video files with 241K frames.
Data is split into train/test set at the ratio of 8:2.

iPER dataset visualization is shown above which tells us about the different insights of dataset like:

(a) shows the class of actions and their number of occurrences like videos of people doing Jumping is 41.

(b) shows the different styles of clothes people wearing.

(d) are the distributions of the height of the total of 30 actors as maximum people are having a height in between 165-175cm.

This is important for data not to get biased, you can download the dataset by clicking on the following links:

Installation

Before installing let’s see some of the system dependencies on which Impersonator++ is tested.

Supported Operating System: It is tested on Ubuntu 16.04/18.04 and Windows10
Needed CUDA 10.1, 10.2, or 11.0 with Nvidia GPU.
gcc in Linux (C++14) or MSVC++
Visual Studio 2019(C++14) in Windows.
ffmpeg (ffprobe) test on 4.3.1+.

If you are not fond of installing, training, and testing Impersonator++ on your local machine, you can simply use Google colab, We are going to use Google Colab for this tutorial so let’s directly jump into coding implementation and for installation into your local machines follow these guides:

Implementation

Before implementation let’s first install all the dependencies needed to run impersonator++ into your Google Colab environment.

Note: set your Runtime to GPU in Colab.

install ffmpeg (ffprobe) and set CUDA_HOME to the system environments

import os
!apt-get install ffmpeg
os.environ["CUDA_HOME"] = "/usr/local/cuda-10.1"
!echo $CUDA_HOME

Clone iPERCore Github Repository and Setup

!git clone https://github.com/iPERDance/iPERCore.git
cd /content/iPERCore/
!python setup.py develop

Downloading all pretrained model checkpoints

For users who are trying it on their local machine download the below checkpoints.

!wget -O assets/checkpoints.zip "https://download.impersonator.org/iper_plus_plus_latest_checkpoints.zip"
!unzip -o assets/checkpoints.zip -d assets/
 
!rm assets/checkpoints.zip

Downloading Samples

!wget -O assets/samples.zip  "https://download.impersonator.org/iper_plus_plus_latest_samples.zip"
!unzip -o assets/samples.zip -d  assets
!rm assets/samples.zip
%cd /content/iPERCore/

Import modules

import os.path as osp
import platform
import argparse
import time
import sys
import subprocess
from IPython.display import HTML
from base64 import b64encode

Run Scripts

# the gpu ids
gpu_ids = "0"
 
# the image size
image_size = 512
 
# the default number of source images, it will be updated if the actual number of sources <= num_source
num_source = 2
 
# the assets directory. This is very important, please download it from `one_drive_url` firstly.
assets_dir = "/content/iPERCore/assets"
 
# the output directory.
output_dir = "./results"
 
# the model id of this case. This is a random model name.
# model_id = "model_" + str(time.time())
 
# # This is a specific model name, and it will be used if you do not change it.
# model_id = "axing_1"
 
# symlink from the actual assets directory to this current directory
work_asserts_dir = os.path.join("./assets")
if not os.path.exists(work_asserts_dir):
    os.symlink(osp.abspath(assets_dir), osp.abspath(work_asserts_dir),
               target_is_directory=(platform.system() == "Windows"))
 
cfg_path = osp.join(work_asserts_dir, "configs", "deploy.toml")

Let’s Run the Trump Case

In this case, there is only a frontal body image as the source inputs, “donal_trump_3” is a specific model name, and it will be used if you do not change it. This is the case of `trump`

model_id = "donald_trump_2"
# the source input information, here \" is escape character of double duote "
src_path = "\"path?=/content/iPERCore/assets/samples/sources/donald_trump_2/00000.PNG,name?=donald_trump_2\""
## the reference input information. There are three reference videos in this case.
# here \" is escape character of double duote "
# ref_path = "\"path?=/content/iPERCore/assets/samples/references/akun_1.mp4," \
#              "name?=akun_2," \
#              "pose_fc?=300\""
 
# ref_path = "\"path?=/content/iPERCore/assets/samples/references/mabaoguo_short.mp4," \
#              "name?=mabaoguo_short," \
#              "pose_fc?=400\""
 
ref_path = "\"path?=/content/iPERCore/assets/samples/references/akun_1.mp4,"  \
             "name?=akun_2," \
             "pose_fc?=300|" \
             "path?=/content/iPERCore/assets/samples/references/mabaoguo_short.mp4," \
             "name?=mabaoguo_short," \
             "pose_fc?=400\""
 
print(ref_path)
 
!python -m iPERCore.services.run_imitator  \
  --gpu_ids     $gpu_ids       \
  --num_source  $num_source    \
  --image_size  $image_size    \
  --output_dir  $output_dir    \
  --model_id    $model_id      \
  --cfg_path    $cfg_path      \
  --src_path    $src_path      \
  --ref_path    $ref_path

The result will be saved in ./results/primitives/donald_trump_2/synthesis/imitations/donald_trump_2-mabaoguo_short.mp4

mp4 = open("./results/primitives/donald_trump_2/synthesis/imitations/donald_trump_2-mabaoguo_short.mp4", "rb").read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML(f"""
<video width="100%" height="100%" controls>
      <source src="{data_url}" type="video/mp4">
</video>""")

Run On custom inputs

What we can do let’s download one person’s image and paste it inside the working directory and then synthesize it with a sample video. Remember to change all the parameters as shown in the below code.

sample person image you can take your own

model_id = "yourmodel_name_any_name"
# the source input information
src_path = "\"path?=/content/your_one_person_image.jpg,name?=person1\""
# src_path = "\"YOU NEED TO REPLACE THIS. FOLLOW THE ABOVE EXAMPLE.\""
 
## the reference input information. There are three reference videos in this case.
ref_path = "\"path?=/content/iPERCore/assets/samples/references/akun_1.mp4," \
             "name?=akun_2," \
             "pose_fc?=300\""
# ref_path = "\"YOU NEED TO REPLACE THIS. FOLLOW THE ABOVE EXAMPLE.\""
!python -m iPERCore.services.run_imitator  \
  --gpu_ids     $gpu_ids       \
  --num_source  $num_source    \
  --image_size  $image_size    \
  --output_dir  $output_dir    \
  --model_id    $model_id      \
  --cfg_path    $cfg_path      \
  --src_path    $src_path      \
  --ref_path    $ref_path

Let’s run our model outputs

mp4 = open("./results/primitives/person1/synthesis/imitations/person1-akun_2.mp4", "rb").read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML(f"""
<video width="100%" height="100%" controls>
      <source src="{data_url}" type="video/mp4">
</video>""")

Conclusion

Impersonator++ is clearly a very easy to use framework, It is an extension of its previous ICCV project impersonator: https://github.com/svip-lab/impersonator and with their new GAN based approaches and the improvised dataset it is gaining popularity, the impersonator community is continuously working on making new approaches like iPER-Dance( video-editing tool for human motion imitation, appearance transfer, and novel view synthesis), for Learning more you can follow these resources: