MITB Banner

Step By Step Guide To Stabilize Facial Landmarks In A Video Using Dlib

Share

Stabilize Facial Landmarks In A Video Using Dlib

The human face has been a topic of interest for deep learning engineers for quite some time now. Understanding the human face not only helps in facial recognition but finds applications in facial morphing, head pose detection and virtual makeovers. If you are a regular user of social media apps like Instagram or Snapchat, have you wondered how the filters fit perfectly for each face? Though every face on the planet is unique, these filters seem to magically align on your nose, lips and eyes. These filters or face-swapping applications make use of facial landmarks. These landmarks are basically points that are meant to help with the identification of the distance between eyes, position of the nose, size of the lips etc. In the context of facial landmarks, our goal is to detect important facial structures on the face using shape prediction methods. 

In this article, we will cover:

  • Need for stabilization
  • Popular types of landmark detectors 
  • Implementation and stabilization of 68 point landmarks for a video.

The need to Stabilize Facial Landmarks

Facial landmarks are easy to use on images since the pixels are not moving, but when it comes to a video, due to continuous motion of pixels and because of translational and rotational variances, a lot of the times these landmarks are unstable. Take a look at the image below.

Few of the points are missing out the features of the face. This can create problems for features that involve estimating the size of the face, the position of the mouth etc. This instability can affect the efficiency of the model and the results. 

Popular types of landmark detectors

The Dlib library is the most popular library for detecting landmarks in the face. There are two types of detectors in this library. 

  1. 68-point landmark detectors: This pre-trained landmark detector identifies 68 points ((x,y) coordinates) in a human face. These points localize the region around the eyes, eyebrows, nose, mouth, chin and jaw.
  1. 5 point landmark detector: To make things faster than the 68 point detector, dlib introduced the 5 point detector which assigns 2 points for the corners of the left eye, 2 points for the right eye and one point for the nose. This detector is most commonly used for alignment of faces. 

Implementation and stabilization of 68 point landmarks for a video

Step 1: Collecting the pre-trained files. 

Create a folder for your project. Create a subfolder called a model. Download the 68-points and 5-points and place them in the subfolder. Next, place this file in the root folder of your project.

Step 2: The data. 

Select a short 5-10 second video for this project with good lighting. I have chosen this video. Feel free to download it from here

Step 3: Importing the required modules

import dlib

import cv2

import numpy as np

import matplotlib.pyplot as plt

import matplotlib

import os

from google.colab import drive

drive.mount('/content/gdrive')

Copy all the downloaded files to your notebook

!cp -r '/content/gdrive/My Drive/video-stable/model' /content

!cp -r '/content/gdrive/My Drive/video-stable/videos' /content

!cp '/content/gdrive/My Drive/video-stable/faceBlendCommon.py' /content

Step 4: Convert the video into image frames and save them to a folder inside your project folder. 

We will convert the entire video into individual image frames since it makes it easier to work with. Create a main folder for all the images.

def image_saver(path, filename, images):

  for count in range(0, len(images)):

    temp = filename + '_' + str(count) + '.png'

    fn = os.path.join(path) + os.path.join(temp)

    cv2.imwrite(fn, images[count])

cap=cv2.VideoCapture('/content/gdrive/MyDrive/video-stable/videos/video_data.mp4')

image_frame = []

while(cap.isOpened()):

    pic, frame = cap.read()

    if frame is None:

      break

    image_frame.append(frame)

cap.release()

plt.imshow(image_frame[0][:,:,::-1])

 Now that we have the image frames, we will save all these frames in a folder. 

directory = "dataset"

parent = "/content/gdrive/My Drive/video-stable/"

path = os.path.join(parent, directory) 

os.mkdir(path)

os.mkdir('/content/gdrive/My Drive/video-stable/dataset/original')

image_saver('/content/gdrive/My Drive/video-stable/dataset/original/', 'frame', image_frame)

Step 5: Facial alignment 

This is an important step in the process. We will use the 5 point detector for aligning the face to the frame to eliminate noise from the background and focus only on the face.

modelroot = '/content/gdrive/My Drive/video-stable/model/'

five_point_landmark = modelroot + "shape_predictor_5_face_landmarks.dat"

detect_face = dlib.get_frontal_face_detector()

detect_landmark = dlib.shape_predictor(five_point_landmark)

Now we will make use of the built in methods of the face blend common to get the detectors and align the face. 

import faceBlendCommon as fb

def facial_alignment(image):

  faceRects = detect_face(image, 0)

  print("Number of faces detected: ",len(faceRects))

  points = fb.getLandmarks(detect_face, detect_landmark, image)

  print('length of points is', points)

  landmarks = np.array(points)

  print('after np array',len(landmarks))

  image = np.float32(image)/255.0

  height = 600

  width = 600

  if len(landmarks) > 0:

    normalize_image, landmarks = fb.normalizeImagesAndLandmarks((height, width), image, landmarks)

    normalize_image= np.uint8(normalize_image*255)

    return normalize_image

  else:

    return image

aligned_faces = []

print('performing alignment')

for count in range(0, len(image_frame)):

  frame = image_frame[count]

  alignment = facial_alignment(frame)

  aligned_faces.append(alignment)

print('Done!')

Let us check one of the images before saving it. 

plt.imshow(aligned_faces[50][:,:,::-1])

plt.title("Aligned Image")

plt.show()

As you can see the background noise has been eliminated and the face has been resized to 600×600 after alignment. Save the aligned images in your images folder.

os.mkdir('/content/gdrive/My Drive/video-stable/dataset/aligned_faces')

image_saver('/content/gdrive/My Drive/video-stable/dataset/aligned_faces/', 'align_face', aligned_faces)

Step 6: Using the 68 point detector and performing stabilization

MODEL_PATH = '/content/gdrive/My Drive/video-stable/model/'

PREDICTOR_PATH = MODEL_PATH + "shape_predictor_68_face_landmarks.dat"

RESIZE_HEIGHT = 480

NUM_FRAMES_FOR_FPS = 100

SKIP_FRAMES = 1

detector = dlib.get_frontal_face_detector()

landmarkDetector = dlib.shape_predictor(PREDICTOR_PATH)

Now, we will calculate the distance between each eye using the function below 

def interEyeDistance(predict):

  leftEyeLeftCorner = (predict[36].x, predict[36].y)

  rightEyeRightCorner = (predict[45].x, predict[45].y)

  distance = cv2.norm(np.array(rightEyeRightCorner) - np.array(leftEyeLeftCorner))

  distance = int(distance)

  return distance

In order to save the points of detection we create separate lists.

points=[]

pointsPrev=[]

pointsDetectedCur=[]

pointsDetectedPrev=[]

all_stabilized_frames=[]

Next, we set the parameters required for the process and perform the stabilization

eyeDistanceNotCalculated = True

eyeDistance = 0

isFirstFrame = True

fps = 10

showStabilized = False

count =0

while(True):

  if (count==0):

    t = cv2.getTickCount()

  ret,im = cap.read()

  if im is None:

    break

  imDlib = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)

  imGray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)

  imGrayPrev = imGray

  height = im.shape[0]

  IMAGE_RESIZE = float(height)/RESIZE_HEIGHT

  imSmall = cv2.resize(im, None, fx=1.0/IMAGE_RESIZE, fy=1.0/IMAGE_RESIZE,interpolation = cv2.INTER_LINEAR)

  imSmallDlib = cv2.cvtColor(imSmall, cv2.COLOR_BGR2RGB)

  if (count % SKIP_FRAMES == 0):

    faces = detector(imSmallDlib,0)

  if len(faces)==0:

    print("No face detected")

  else:

    for i in range(0,len(faces)):

      print("face detected")

      newRect = dlib.rectangle(int(faces[i].left() * IMAGE_RESIZE),

        int(faces[i].top() * IMAGE_RESIZE),

        int(faces[i].right() * IMAGE_RESIZE),

        int(faces[i].bottom() * IMAGE_RESIZE))

      landmarks = landmarkDetector(imDlib, newRect).parts()

      if (isFirstFrame==True):

        pointsPrev=[]

        pointsDetectedPrev = []

        [pointsPrev.append((p.x, p.y)) for p in landmarks]

        [pointsDetectedPrev.append((p.x, p.y)) for p in landmarks]

      else:

        pointsPrev=[]

        pointsDetectedPrev = []

        pointsPrev = points

        pointsDetectedPrev = pointsDetectedCur

      points = []

      pointsDetectedCur = []

      [points.append((p.x, p.y)) for p in landmarks]

      [pointsDetectedCur.append((p.x, p.y)) for p in landmarks]

      pointsArr = np.array(points,np.float32)

      pointsPrevArr = np.array(pointsPrev,np.float32)

      if eyeDistanceNotCalculated:

        eyeDistance = interEyeDistance(landmarks)

        print(eyeDistance)

        eyeDistanceNotCalculated = False

      if eyeDistance > 100:

          dotRadius = 3

      else:

        dotRadius = 2

      print(eyeDistance)

      sigma = eyeDistance * eyeDistance / 400

      s = 2*int(eyeDistance/4)+1

      lk_params = dict(winSize  = (s, s), maxLevel = 5, criteria = (cv2.TERM_CRITERIA_COUNT | cv2.TERM_CRITERIA_EPS, 20, 0.03))

      pointsArr,status, err = cv2.calcOpticalFlowPyrLK(imGrayPrev,imGray,pointsPrevArr,pointsArr,**lk_params)

      pointsArrFloat = np.array(pointsArr,np.float32)

      points = pointsArrFloat.tolist()

      for k in range(0,len(landmarks)):

        d = cv2.norm(np.array(pointsDetectedPrev[k]) - np.array(pointsDetectedCur[k]))

        alpha = math.exp(-d*d/sigma)

        points[k] = (1 - alpha) * np.array(pointsDetectedCur[k]) + alpha * np.array(points[k])

      if showStabilized is True:

        for p in points:

          cv2.circle(im,(int(p[0]),int(p[1])),dotRadius, (255,0,0),-1)

      else:

        for p in pointsDetectedCur:

          cv2.circle(im,(int(p[0]),int(p[1])),dotRadius, (0,0,255),-1)

      isFirstFrame = False

      count = count+1

      if ( count == NUM_FRAMES_FOR_FPS):

        t = (cv2.getTickCount()-t)/cv2.getTickFrequency()

        fps = NUM_FRAMES_FOR_FPS/t

        count = 0

        isFirstFrame = True

      cv2.putText(im, "{:.1f}-fps".format(fps), (50, size[0]-50), cv2.FONT_HERSHEY_COMPLEX, 1.5, (0, 0, 255), 3,cv2.LINE_AA)

      all_stabilized_frames.append(im)

      imPrev = im

      imGrayPrev = imGray

cap.release()

plt.imshow(all_stabilized_frames[1][:,:,::-1])

plt.title("Stabilized Image")

plt.show()

As you can see above, the 68 points are applied to the face. We save these images in a folder and move on the final step. 

os.mkdir('/content/gdrive/My Drive/video-stable/dataset/stable_faces')

image_saver('/content/gdrive/My Drive/video-stable/dataset/stable_faces/', 'stable_face', all_stabilized_frames)

Step 7: The last step is to stitch the original, aligned and stable frames back to a video. 

We will resize the frames to fit the screen and stitch the images together. 

def read_all_images(dir, filename_prefix, num_files):

  result_list = []

  for cnt in range(0, num_files):

    fn = filename_prefix + '_' + str(cnt) + '.png'

    full_path = os.path.join(dir, fn)

    img = cv2.imread(full_path)

    result_list.append(img)

  return result_list

original_frames = read_all_images('/content/gdrive/My Drive/video-stable/dataset/original', 'frame', 157)

aligned_frames = read_all_images('/content/gdrive/My Drive/video-stable/dataset/aligned_faces', 'align_face', 157)

stable_frames = read_all_images('/content/gdrive/My Drive/video-stable/dataset/stable_faces', 'stable_face', 157)

def resize_images(imageList, width, height):

  result_list = []

  for cnt in range(0, len(imageList)):

    new_img = cv2.resize(imageList[cnt], (width, height), interpolation=cv2.INTER_AREA)

    result_list.append(new_img)

  return result_list

orig_frames_resized = resize_images(original_frames, 800, 800)

aligned_resized=resize_images(aligned_frames,800,800)

stab_frames_resized = resize_images(stable_frames, 800, 800)

stitched = []

for cnt in range(0, len(orig_frames_resized)):

  new_img = np.hstack((orig_frames_resized[cnt], aligned_resized[cnt], stab_frames_resized[cnt]))

  stitched.append(new_img)

plt.imshow(stitched_frames[50][:,:,::-1])

Stabilize Facial Landmarks In A Video Using Dlib

Let us convert these into a video.

image_saver('/content/gdrive/My Drive/video-stable/dataset/', 'stitch', stitched_frames)

from os.path import isfile, join

def convert_frames_to_video(pathIn,pathOut,fps):

    frame_array = []

    files = [f for f in os.listdir(pathIn) if isfile(join(pathIn, f))]

    for i in range(len(files)):

        filename=pathIn + files[i]

        img = cv2.imread(filename)

        height, width, layers = img.shape

        size = (width,height)

        print(filename)

        frame_array.append(img)

    out = cv2.VideoWriter(pathOut,cv2.VideoWriter_fourcc(*'DIVX'), fps, size)

    for i in range(len(frame_array)):

        out.write(frame_array[i])

    out.release()

pathIn= '/content/gdrive/My Drive/video-stable/dataset/'

pathOut = '/content/gdrive/My Drive/video-stable/videos/finalvid.mp4'

fps = 30.0

convert_frames_to_video(pathIn, pathOut, fps)

Here is the final output.

Stabilize Facial Landmarks In A Video Using Dlib

The final video shows that despite movement of the lips and facial contortions, the points are stable and are adjusting according to the movement. You can check this video here. 

Conclusion 

In this article, we have learned the step-by-step process to stabilize the important landmark for a face in a video. In order to improve the accuracy and precision of face detection or recognition systems, the process of stabilization is very important. 

Share
Picture of Bhoomika Madhukar

Bhoomika Madhukar

I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.