Last updated October 3, 2020
In AI Mysteries

Step By Step Guide To Stabilize Facial Landmarks In A Video Using Dlib

Share

Published on August 9, 2020

by Bhoomika Madhukar

The human face has been a topic of interest for deep learning engineers for quite some time now. Understanding the human face not only helps in facial recognition but finds applications in facial morphing, head pose detection and virtual makeovers. If you are a regular user of social media apps like Instagram or Snapchat, have you wondered how the filters fit perfectly for each face? Though every face on the planet is unique, these filters seem to magically align on your nose, lips and eyes. These filters or face-swapping applications make use of facial landmarks. These landmarks are basically points that are meant to help with the identification of the distance between eyes, position of the nose, size of the lips etc. In the context of facial landmarks, our goal is to detect important facial structures on the face using shape prediction methods.

In this article, we will cover:

Need for stabilization
Popular types of landmark detectors
Implementation and stabilization of 68 point landmarks for a video.

The need to Stabilize Facial Landmarks

Facial landmarks are easy to use on images since the pixels are not moving, but when it comes to a video, due to continuous motion of pixels and because of translational and rotational variances, a lot of the times these landmarks are unstable. Take a look at the image below.

Few of the points are missing out the features of the face. This can create problems for features that involve estimating the size of the face, the position of the mouth etc. This instability can affect the efficiency of the model and the results.

Popular types of landmark detectors

The Dlib library is the most popular library for detecting landmarks in the face. There are two types of detectors in this library.

68-point landmark detectors: This pre-trained landmark detector identifies 68 points ((x,y) coordinates) in a human face. These points localize the region around the eyes, eyebrows, nose, mouth, chin and jaw.

5 point landmark detector: To make things faster than the 68 point detector, dlib introduced the 5 point detector which assigns 2 points for the corners of the left eye, 2 points for the right eye and one point for the nose. This detector is most commonly used for alignment of faces.

Implementation and stabilization of 68 point landmarks for a video

Step 1: Collecting the pre-trained files.

Create a folder for your project. Create a subfolder called a model. Download the 68-points and 5-points and place them in the subfolder. Next, place this file in the root folder of your project.

Step 2: The data.

Select a short 5-10 second video for this project with good lighting. I have chosen this video. Feel free to download it from here.

Step 3: Importing the required modules

import dlib

import cv2

import numpy as np

import matplotlib.pyplot as plt

import matplotlib

import os

from google.colab import drive

drive.mount('/content/gdrive')

Copy all the downloaded files to your notebook

!cp -r '/content/gdrive/My Drive/video-stable/model' /content

!cp -r '/content/gdrive/My Drive/video-stable/videos' /content

!cp '/content/gdrive/My Drive/video-stable/faceBlendCommon.py' /content

Step 4: Convert the video into image frames and save them to a folder inside your project folder.

We will convert the entire video into individual image frames since it makes it easier to work with. Create a main folder for all the images.

def image_saver(path, filename, images):

for count in range(0, len(images)):

temp = filename + '_' + str(count) + '.png'

fn = os.path.join(path) + os.path.join(temp)

cv2.imwrite(fn, images[count])

cap=cv2.VideoCapture('/content/gdrive/MyDrive/video-stable/videos/video_data.mp4')

image_frame = []

while(cap.isOpened()):

pic, frame = cap.read()

if frame is None:

break

image_frame.append(frame)

cap.release()

plt.imshow(image_frame[0][:,:,::-1])

Now that we have the image frames, we will save all these frames in a folder.

directory = "dataset"

parent = "/content/gdrive/My Drive/video-stable/"

path = os.path.join(parent, directory)

os.mkdir(path)

os.mkdir('/content/gdrive/My Drive/video-stable/dataset/original')

image_saver('/content/gdrive/My Drive/video-stable/dataset/original/', 'frame', image_frame)

Step 5: Facial alignment

This is an important step in the process. We will use the 5 point detector for aligning the face to the frame to eliminate noise from the background and focus only on the face.

modelroot = '/content/gdrive/My Drive/video-stable/model/'

five_point_landmark = modelroot + "shape_predictor_5_face_landmarks.dat"

detect_face = dlib.get_frontal_face_detector()

detect_landmark = dlib.shape_predictor(five_point_landmark)

Now we will make use of the built in methods of the face blend common to get the detectors and align the face.

import faceBlendCommon as fb

def facial_alignment(image):

faceRects = detect_face(image, 0)

print("Number of faces detected: ",len(faceRects))

points = fb.getLandmarks(detect_face, detect_landmark, image)

print('length of points is', points)

landmarks = np.array(points)

print('after np array',len(landmarks))

image = np.float32(image)/255.0

height = 600

width = 600

if len(landmarks) > 0:

normalize_image, landmarks = fb.normalizeImagesAndLandmarks((height, width), image, landmarks)

normalize_image= np.uint8(normalize_image*255)

return normalize_image

else:

return image

aligned_faces = []

print('performing alignment')

for count in range(0, len(image_frame)):

frame = image_frame[count]

alignment = facial_alignment(frame)

aligned_faces.append(alignment)

print('Done!')

Let us check one of the images before saving it.

plt.imshow(aligned_faces[50][:,:,::-1])

plt.title("Aligned Image")

plt.show()

As you can see the background noise has been eliminated and the face has been resized to 600×600 after alignment. Save the aligned images in your images folder.

os.mkdir('/content/gdrive/My Drive/video-stable/dataset/aligned_faces')

image_saver('/content/gdrive/My Drive/video-stable/dataset/aligned_faces/', 'align_face', aligned_faces)

Step 6: Using the 68 point detector and performing stabilization

MODEL_PATH = '/content/gdrive/My Drive/video-stable/model/'

PREDICTOR_PATH = MODEL_PATH + "shape_predictor_68_face_landmarks.dat"

RESIZE_HEIGHT = 480

NUM_FRAMES_FOR_FPS = 100

SKIP_FRAMES = 1

detector = dlib.get_frontal_face_detector()

landmarkDetector = dlib.shape_predictor(PREDICTOR_PATH)

Now, we will calculate the distance between each eye using the function below

def interEyeDistance(predict):

leftEyeLeftCorner = (predict[36].x, predict[36].y)

rightEyeRightCorner = (predict[45].x, predict[45].y)

distance = cv2.norm(np.array(rightEyeRightCorner) - np.array(leftEyeLeftCorner))

distance = int(distance)

return distance

In order to save the points of detection we create separate lists.

points=[]

pointsPrev=[]

pointsDetectedCur=[]

pointsDetectedPrev=[]

all_stabilized_frames=[]

Next, we set the parameters required for the process and perform the stabilization

eyeDistanceNotCalculated = True

eyeDistance = 0

isFirstFrame = True

fps = 10

showStabilized = False

count =0

while(True):

if (count==0):

t = cv2.getTickCount()

ret,im = cap.read()

if im is None:

break

imDlib = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)

imGray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)

imGrayPrev = imGray

height = im.shape[0]

IMAGE_RESIZE = float(height)/RESIZE_HEIGHT

imSmall = cv2.resize(im, None, fx=1.0/IMAGE_RESIZE, fy=1.0/IMAGE_RESIZE,interpolation = cv2.INTER_LINEAR)

imSmallDlib = cv2.cvtColor(imSmall, cv2.COLOR_BGR2RGB)

if (count % SKIP_FRAMES == 0):

faces = detector(imSmallDlib,0)

if len(faces)==0:

print("No face detected")

else:

for i in range(0,len(faces)):

print("face detected")

newRect = dlib.rectangle(int(faces[i].left() * IMAGE_RESIZE),

int(faces[i].top() * IMAGE_RESIZE),

int(faces[i].right() * IMAGE_RESIZE),

int(faces[i].bottom() * IMAGE_RESIZE))

landmarks = landmarkDetector(imDlib, newRect).parts()

if (isFirstFrame==True):

pointsPrev=[]

pointsDetectedPrev = []

[pointsPrev.append((p.x, p.y)) for p in landmarks]

[pointsDetectedPrev.append((p.x, p.y)) for p in landmarks]

else:

pointsPrev=[]

pointsDetectedPrev = []

pointsPrev = points

pointsDetectedPrev = pointsDetectedCur

points = []

pointsDetectedCur = []

[points.append((p.x, p.y)) for p in landmarks]

[pointsDetectedCur.append((p.x, p.y)) for p in landmarks]

pointsArr = np.array(points,np.float32)

pointsPrevArr = np.array(pointsPrev,np.float32)

if eyeDistanceNotCalculated:

eyeDistance = interEyeDistance(landmarks)

print(eyeDistance)

eyeDistanceNotCalculated = False

if eyeDistance > 100:

dotRadius = 3

else:

dotRadius = 2

print(eyeDistance)

sigma = eyeDistance * eyeDistance / 400

s = 2*int(eyeDistance/4)+1

lk_params = dict(winSize = (s, s), maxLevel = 5, criteria = (cv2.TERM_CRITERIA_COUNT | cv2.TERM_CRITERIA_EPS, 20, 0.03))

pointsArr,status, err = cv2.calcOpticalFlowPyrLK(imGrayPrev,imGray,pointsPrevArr,pointsArr,**lk_params)

pointsArrFloat = np.array(pointsArr,np.float32)

points = pointsArrFloat.tolist()

for k in range(0,len(landmarks)):

d = cv2.norm(np.array(pointsDetectedPrev[k]) - np.array(pointsDetectedCur[k]))

alpha = math.exp(-d*d/sigma)

points[k] = (1 - alpha) * np.array(pointsDetectedCur[k]) + alpha * np.array(points[k])

if showStabilized is True:

for p in points:

cv2.circle(im,(int(p[0]),int(p[1])),dotRadius, (255,0,0),-1)

else:

for p in pointsDetectedCur:

cv2.circle(im,(int(p[0]),int(p[1])),dotRadius, (0,0,255),-1)

isFirstFrame = False

count = count+1

if ( count == NUM_FRAMES_FOR_FPS):

t = (cv2.getTickCount()-t)/cv2.getTickFrequency()

fps = NUM_FRAMES_FOR_FPS/t

count = 0

isFirstFrame = True

cv2.putText(im, "{:.1f}-fps".format(fps), (50, size[0]-50), cv2.FONT_HERSHEY_COMPLEX, 1.5, (0, 0, 255), 3,cv2.LINE_AA)

all_stabilized_frames.append(im)

imPrev = im

imGrayPrev = imGray

cap.release()

plt.imshow(all_stabilized_frames[1][:,:,::-1])

plt.title("Stabilized Image")

plt.show()

As you can see above, the 68 points are applied to the face. We save these images in a folder and move on the final step.

os.mkdir('/content/gdrive/My Drive/video-stable/dataset/stable_faces')

image_saver('/content/gdrive/My Drive/video-stable/dataset/stable_faces/', 'stable_face', all_stabilized_frames)

Step 7: The last step is to stitch the original, aligned and stable frames back to a video.

We will resize the frames to fit the screen and stitch the images together.

def read_all_images(dir, filename_prefix, num_files):

result_list = []

for cnt in range(0, num_files):

fn = filename_prefix + '_' + str(cnt) + '.png'

full_path = os.path.join(dir, fn)

img = cv2.imread(full_path)

result_list.append(img)

return result_list

original_frames = read_all_images('/content/gdrive/My Drive/video-stable/dataset/original', 'frame', 157)

aligned_frames = read_all_images('/content/gdrive/My Drive/video-stable/dataset/aligned_faces', 'align_face', 157)

stable_frames = read_all_images('/content/gdrive/My Drive/video-stable/dataset/stable_faces', 'stable_face', 157)

def resize_images(imageList, width, height):

result_list = []

for cnt in range(0, len(imageList)):

new_img = cv2.resize(imageList[cnt], (width, height), interpolation=cv2.INTER_AREA)

result_list.append(new_img)

return result_list

orig_frames_resized = resize_images(original_frames, 800, 800)

aligned_resized=resize_images(aligned_frames,800,800)

stab_frames_resized = resize_images(stable_frames, 800, 800)

stitched = []

for cnt in range(0, len(orig_frames_resized)):

new_img = np.hstack((orig_frames_resized[cnt], aligned_resized[cnt], stab_frames_resized[cnt]))

stitched.append(new_img)

plt.imshow(stitched_frames[50][:,:,::-1])

Stabilize Facial Landmarks In A Video Using Dlib

Let us convert these into a video.

image_saver('/content/gdrive/My Drive/video-stable/dataset/', 'stitch', stitched_frames)

from os.path import isfile, join

def convert_frames_to_video(pathIn,pathOut,fps):

frame_array = []

files = [f for f in os.listdir(pathIn) if isfile(join(pathIn, f))]

for i in range(len(files)):

filename=pathIn + files[i]

img = cv2.imread(filename)

height, width, layers = img.shape

size = (width,height)

print(filename)

frame_array.append(img)

out = cv2.VideoWriter(pathOut,cv2.VideoWriter_fourcc(*'DIVX'), fps, size)

for i in range(len(frame_array)):

out.write(frame_array[i])

out.release()

pathIn= '/content/gdrive/My Drive/video-stable/dataset/'

pathOut = '/content/gdrive/My Drive/video-stable/videos/finalvid.mp4'

fps = 30.0

convert_frames_to_video(pathIn, pathOut, fps)

Here is the final output.

The final video shows that despite movement of the lips and facial contortions, the points are stable and are adjusting according to the movement. You can check this video here.

Conclusion

In this article, we have learned the step-by-step process to stabilize the important landmark for a face in a video. In order to improve the accuracy and precision of face detection or recognition systems, the process of stabilization is very important.

Access all our open Survey & Awards Nomination forms in one place

Bhoomika Madhukar

I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.