MITB Banner

Guide to OpenPose for Real-time Human Pose Estimation

Share

OpenPose is a Real-time multiple-person detection library, and it’s the first time that any library has shown the capability of jointly detecting human body, face, and foot keypoints. Thanks to Gines Hidalgo, Zhe Cao, Tomas Simon, Shih-En Wei, Hanbyul Joo, and Yaser Sheikh for making this project successful and this library is very much dependent on the CMU Panoptic Studio dataset. 

OpenPose is written in C++ and Caffe. Today we are going to see a very popular library with almost a 19.8k star and 6k fork on Github: OpenPose with a small implementation in python, the authors have created many builds for different operating systems and languages. You can try it in your local machine with GPU or without GPU, with Linux or without Linux.

There are many features of OpenPose library let’s see some of the most remarkable ones:

  • Real-time 2D multi-person keypoint detections.
  • Real-time 3D single-person keypoint detections.
  • Included a Calibration toolbox for estimation of distortion, intrinsic, and extrinsic camera parameters.
  • Single-person tracking for speeding up the detection and visual smoothing.

OpenPose Pipeline

Image for post
Figure 1. Full pipeline steps involved in OpenPose

Before going into coding, implementation let’s look at the pipeline followed by OpenPose. 

  1. First, an input RGB(red green blue) image is fed into a “two-branch multi-stage” convolutional neural network(CNN) i.e. CNN is going to produce two different outputs.
  2. The top branch, shown in the above figure(beige), predicts the confidence maps (Figure 1b) of different body parts like the right eye, left eye, right elbow, and others. 
  3. The bottom branch predicts the affinity fields (Figure 1c), which represents a degree of association between different body parts of the input image.
  4. 2nd last, the confidence maps and affinity fields are being processed by greedy inference (Fig 1d).
  5. The pose estimation outputs of the 2D key points for all people in the image are produced as shown in (Fig 1e).

In order to capture more fine outputs, we use Multi-stage to increase the depth of the neural network approach, which means that the network is stacked one on top of the other at every stage. Image for post

Figure 2. OpenPose Architecture of the two-branch multi-stage CNN.

In Figure 2, the top branch of the neural network of OpenPose produces a set of detection confidence maps S. The mathematical definition can be defined as follows.

Image for post
confidence maps

Outputs of Multi-Stage OpenPose network

Let’s see the pipeline outputs how they got affected stage by stage and at the end, we get our pose estimation on the input image.

Image for post
Figure 3. The outcome of a multi-stage network.

In the above Figure 3, The blue overlay TOP row shows the OpenPose network predicting confidence maps of the right wrist.

And the BOTTOM row shows the network predicting the Part Affinity Fields of the right forearm (right shoulder — right wrist) of humans across different stages.

Implementation

With this one passage command, your openpose will be extracted from GitHub to your google colab GPU runtime environment and it will install CMake with cuda10 and install all the dependencies needed to run the library. Also, we will be needing the youtube-dl library for using OpenPose pose estimation and keypoint detection directly on youtube videos

Installing OpenPose

import os
from os.path import exists, join, basename, splitext
# initiating variable for cloning openpose
git_repo_url = 'https://github.com/CMU-Perceptual-Computing-Lab/openpose.git'
project_name = splitext(basename(git_repo_url))[0]
if not exists(project_name):
  # see: https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/949
  # install new CMake becaue of CUDA10
  !wget -q https://cmake.org/files/v3.13/cmake-3.13.0-Linux-x86_64.tar.gz
  !tar xfz cmake-3.13.0-Linux-x86_64.tar.gz --strip-components=1 -C /usr/local
  # clone openpose
  !git clone -q --depth 1 $git_repo_url
  !sed -i 's/execute_process(COMMAND git checkout master WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}\/3rdparty\/caffe)/execute_process(COMMAND git checkout f019d0dfe86f49d1140961f8c7dec22130c83154 WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}\/3rdparty\/caffe)/g' openpose/CMakeLists.txt
  # install system dependencies
  !apt-get -qq install -y libatlas-base-dev libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler libgflags-dev libgoogle-glog-dev liblmdb-dev opencl-headers ocl-icd-opencl-dev libviennacl-dev
  # install python dependencies
  !pip install -q youtube-dl
  # build openpose
  !cd openpose && rm -rf build || true && mkdir build && cd build && cmake .. && make -j`nproc`
from IPython.display import YouTubeVideo

Input & preprocess a Custom video for Pose estimation

We are going to use the charlie video as our input sample but for test, we don’t need to wait 

YOUTUBE_ID = '0daS_SDCT_U'
YouTubeVideo(YOUTUBE_ID)

Input video:

Output

import io
import base64
from IPython.display import HTML
file_name=’output.mp4’
width=640
height=480
video_encoded = base64.b64encode(io.open(file_name, 'rb').read())
return HTML(data='''<video width="{0}" height="{1}" alt="test" controls>
                        <source src="data:video/mp4;base64,{2}" type="video/mp4" />
                      </video>'''.format(width, height, video_encoded.decode('ascii')))

Output video:

Conclusion

OpenPose is the best library for pose estimation and body keypoints detection, including accurately detecting foot, joining boines, and face. To learn more, you can follow some of the below resources, which include codes and research papers for a deep understanding of OpenPose:

Read more in Computer Vision

Share
Picture of Mohit Maithani

Mohit Maithani

Mohit is a Data & Technology Enthusiast with good exposure to solving real-world problems in various avenues of IT and Deep learning domain. He believes in solving human's daily problems with the help of technology.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.