Step By Step Guide To Stabilize Facial Landmarks In A Video Using Dlib

Stabilize Facial Landmarks In A Video Using Dlib

The human face has been a topic of interest for deep learning engineers for quite some time now. Understanding the human face not only helps in facial recognition but finds applications in facial morphing, head pose detection and virtual makeovers. If you are a regular user of social media apps like Instagram or Snapchat, have you wondered how the filters fit perfectly for each face? Though every face on the planet is unique, these filters seem to magically align on your nose, lips and eyes. These filters or face-swapping applications make use of facial landmarks. These landmarks are basically points that are meant to help with the identification of the distance between eyes, position of the nose, size of the lips etc. In the context of facial landmarks, our goal is to detect important facial structures on the face using shape prediction methods. 

In this article, we will cover:

  • Need for stabilization
  • Popular types of landmark detectors 
  • Implementation and stabilization of 68 point landmarks for a video.

The need to Stabilize Facial Landmarks

Facial landmarks are easy to use on images since the pixels are not moving, but when it comes to a video, due to continuous motion of pixels and because of translational and rotational variances, a lot of the times these landmarks are unstable. Take a look at the image below.


Sign up for your weekly dose of what's up in emerging technology.

Few of the points are missing out the features of the face. This can create problems for features that involve estimating the size of the face, the position of the mouth etc. This instability can affect the efficiency of the model and the results. 

Popular types of landmark detectors

The Dlib library is the most popular library for detecting landmarks in the face. There are two types of detectors in this library. 

  1. 68-point landmark detectors: This pre-trained landmark detector identifies 68 points ((x,y) coordinates) in a human face. These points localize the region around the eyes, eyebrows, nose, mouth, chin and jaw.
  1. 5 point landmark detector: To make things faster than the 68 point detector, dlib introduced the 5 point detector which assigns 2 points for the corners of the left eye, 2 points for the right eye and one point for the nose. This detector is most commonly used for alignment of faces. 

Implementation and stabilization of 68 point landmarks for a video

Step 1: Collecting the pre-trained files. 

Create a folder for your project. Create a subfolder called a model. Download the 68-points and 5-points and place them in the subfolder. Next, place this file in the root folder of your project.

Step 2: The data. 

Select a short 5-10 second video for this project with good lighting. I have chosen this video. Feel free to download it from here

Step 3: Importing the required modules

import dlib

import cv2

import numpy as np

import matplotlib.pyplot as plt

import matplotlib

import os

from google.colab import drive


Copy all the downloaded files to your notebook

!cp -r '/content/gdrive/My Drive/video-stable/model' /content

!cp -r '/content/gdrive/My Drive/video-stable/videos' /content

!cp '/content/gdrive/My Drive/video-stable/' /content

Step 4: Convert the video into image frames and save them to a folder inside your project folder. 

We will convert the entire video into individual image frames since it makes it easier to work with. Create a main folder for all the images.

def image_saver(path, filename, images):

  for count in range(0, len(images)):

    temp = filename + '_' + str(count) + '.png'

    fn = os.path.join(path) + os.path.join(temp)

    cv2.imwrite(fn, images[count])


image_frame = []


    pic, frame =

    if frame is None:





 Now that we have the image frames, we will save all these frames in a folder. 

directory = "dataset"

parent = "/content/gdrive/My Drive/video-stable/"

path = os.path.join(parent, directory) 


os.mkdir('/content/gdrive/My Drive/video-stable/dataset/original')

image_saver('/content/gdrive/My Drive/video-stable/dataset/original/', 'frame', image_frame)

Step 5: Facial alignment 

This is an important step in the process. We will use the 5 point detector for aligning the face to the frame to eliminate noise from the background and focus only on the face.

modelroot = '/content/gdrive/My Drive/video-stable/model/'

five_point_landmark = modelroot + "shape_predictor_5_face_landmarks.dat"

detect_face = dlib.get_frontal_face_detector()

detect_landmark = dlib.shape_predictor(five_point_landmark)

Now we will make use of the built in methods of the face blend common to get the detectors and align the face. 

import faceBlendCommon as fb

def facial_alignment(image):

  faceRects = detect_face(image, 0)

  print("Number of faces detected: ",len(faceRects))

  points = fb.getLandmarks(detect_face, detect_landmark, image)

  print('length of points is', points)

  landmarks = np.array(points)

  print('after np array',len(landmarks))

  image = np.float32(image)/255.0

  height = 600

  width = 600

  if len(landmarks) > 0:

    normalize_image, landmarks = fb.normalizeImagesAndLandmarks((height, width), image, landmarks)

    normalize_image= np.uint8(normalize_image*255)

    return normalize_image


    return image

aligned_faces = []

print('performing alignment')

for count in range(0, len(image_frame)):

  frame = image_frame[count]

  alignment = facial_alignment(frame)



Let us check one of the images before saving it. 


plt.title("Aligned Image")

As you can see the background noise has been eliminated and the face has been resized to 600×600 after alignment. Save the aligned images in your images folder.

os.mkdir('/content/gdrive/My Drive/video-stable/dataset/aligned_faces')

image_saver('/content/gdrive/My Drive/video-stable/dataset/aligned_faces/', 'align_face', aligned_faces)

Step 6: Using the 68 point detector and performing stabilization

MODEL_PATH = '/content/gdrive/My Drive/video-stable/model/'

PREDICTOR_PATH = MODEL_PATH + "shape_predictor_68_face_landmarks.dat"




detector = dlib.get_frontal_face_detector()

landmarkDetector = dlib.shape_predictor(PREDICTOR_PATH)

Now, we will calculate the distance between each eye using the function below 

def interEyeDistance(predict):

  leftEyeLeftCorner = (predict[36].x, predict[36].y)

  rightEyeRightCorner = (predict[45].x, predict[45].y)

  distance = cv2.norm(np.array(rightEyeRightCorner) - np.array(leftEyeLeftCorner))

  distance = int(distance)

  return distance

In order to save the points of detection we create separate lists.






Next, we set the parameters required for the process and perform the stabilization

eyeDistanceNotCalculated = True

eyeDistance = 0

isFirstFrame = True

fps = 10

showStabilized = False

count =0


  if (count==0):

    t = cv2.getTickCount()

  ret,im =

  if im is None:


  imDlib = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)

  imGray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)

  imGrayPrev = imGray

  height = im.shape[0]


  imSmall = cv2.resize(im, None, fx=1.0/IMAGE_RESIZE, fy=1.0/IMAGE_RESIZE,interpolation = cv2.INTER_LINEAR)

  imSmallDlib = cv2.cvtColor(imSmall, cv2.COLOR_BGR2RGB)

  if (count % SKIP_FRAMES == 0):

    faces = detector(imSmallDlib,0)

  if len(faces)==0:

    print("No face detected")


    for i in range(0,len(faces)):

      print("face detected")

      newRect = dlib.rectangle(int(faces[i].left() * IMAGE_RESIZE),

        int(faces[i].top() * IMAGE_RESIZE),

        int(faces[i].right() * IMAGE_RESIZE),

        int(faces[i].bottom() * IMAGE_RESIZE))

      landmarks = landmarkDetector(imDlib, newRect).parts()

      if (isFirstFrame==True):


        pointsDetectedPrev = []

        [pointsPrev.append((p.x, p.y)) for p in landmarks]

        [pointsDetectedPrev.append((p.x, p.y)) for p in landmarks]



        pointsDetectedPrev = []

        pointsPrev = points

        pointsDetectedPrev = pointsDetectedCur

      points = []

      pointsDetectedCur = []

      [points.append((p.x, p.y)) for p in landmarks]

      [pointsDetectedCur.append((p.x, p.y)) for p in landmarks]

      pointsArr = np.array(points,np.float32)

      pointsPrevArr = np.array(pointsPrev,np.float32)

      if eyeDistanceNotCalculated:

        eyeDistance = interEyeDistance(landmarks)


        eyeDistanceNotCalculated = False

      if eyeDistance > 100:

          dotRadius = 3


        dotRadius = 2


      sigma = eyeDistance * eyeDistance / 400

      s = 2*int(eyeDistance/4)+1

      lk_params = dict(winSize  = (s, s), maxLevel = 5, criteria = (cv2.TERM_CRITERIA_COUNT | cv2.TERM_CRITERIA_EPS, 20, 0.03))

      pointsArr,status, err = cv2.calcOpticalFlowPyrLK(imGrayPrev,imGray,pointsPrevArr,pointsArr,**lk_params)

      pointsArrFloat = np.array(pointsArr,np.float32)

      points = pointsArrFloat.tolist()

      for k in range(0,len(landmarks)):

        d = cv2.norm(np.array(pointsDetectedPrev[k]) - np.array(pointsDetectedCur[k]))

        alpha = math.exp(-d*d/sigma)

        points[k] = (1 - alpha) * np.array(pointsDetectedCur[k]) + alpha * np.array(points[k])

      if showStabilized is True:

        for p in points:

,(int(p[0]),int(p[1])),dotRadius, (255,0,0),-1)


        for p in pointsDetectedCur:

,(int(p[0]),int(p[1])),dotRadius, (0,0,255),-1)

      isFirstFrame = False

      count = count+1

      if ( count == NUM_FRAMES_FOR_FPS):

        t = (cv2.getTickCount()-t)/cv2.getTickFrequency()

        fps = NUM_FRAMES_FOR_FPS/t

        count = 0

        isFirstFrame = True

      cv2.putText(im, "{:.1f}-fps".format(fps), (50, size[0]-50), cv2.FONT_HERSHEY_COMPLEX, 1.5, (0, 0, 255), 3,cv2.LINE_AA)


      imPrev = im

      imGrayPrev = imGray



plt.title("Stabilized Image")

As you can see above, the 68 points are applied to the face. We save these images in a folder and move on the final step. 

os.mkdir('/content/gdrive/My Drive/video-stable/dataset/stable_faces')

image_saver('/content/gdrive/My Drive/video-stable/dataset/stable_faces/', 'stable_face', all_stabilized_frames)

Step 7: The last step is to stitch the original, aligned and stable frames back to a video. 

We will resize the frames to fit the screen and stitch the images together. 

def read_all_images(dir, filename_prefix, num_files):

  result_list = []

  for cnt in range(0, num_files):

    fn = filename_prefix + '_' + str(cnt) + '.png'

    full_path = os.path.join(dir, fn)

    img = cv2.imread(full_path)


  return result_list

original_frames = read_all_images('/content/gdrive/My Drive/video-stable/dataset/original', 'frame', 157)

aligned_frames = read_all_images('/content/gdrive/My Drive/video-stable/dataset/aligned_faces', 'align_face', 157)

stable_frames = read_all_images('/content/gdrive/My Drive/video-stable/dataset/stable_faces', 'stable_face', 157)

def resize_images(imageList, width, height):

  result_list = []

  for cnt in range(0, len(imageList)):

    new_img = cv2.resize(imageList[cnt], (width, height), interpolation=cv2.INTER_AREA)


  return result_list

orig_frames_resized = resize_images(original_frames, 800, 800)


stab_frames_resized = resize_images(stable_frames, 800, 800)

stitched = []

for cnt in range(0, len(orig_frames_resized)):

  new_img = np.hstack((orig_frames_resized[cnt], aligned_resized[cnt], stab_frames_resized[cnt]))



Stabilize Facial Landmarks In A Video Using Dlib

Let us convert these into a video.

image_saver('/content/gdrive/My Drive/video-stable/dataset/', 'stitch', stitched_frames)

from os.path import isfile, join

def convert_frames_to_video(pathIn,pathOut,fps):

    frame_array = []

    files = [f for f in os.listdir(pathIn) if isfile(join(pathIn, f))]

    for i in range(len(files)):

        filename=pathIn + files[i]

        img = cv2.imread(filename)

        height, width, layers = img.shape

        size = (width,height)



    out = cv2.VideoWriter(pathOut,cv2.VideoWriter_fourcc(*'DIVX'), fps, size)

    for i in range(len(frame_array)):



pathIn= '/content/gdrive/My Drive/video-stable/dataset/'

pathOut = '/content/gdrive/My Drive/video-stable/videos/finalvid.mp4'

fps = 30.0

convert_frames_to_video(pathIn, pathOut, fps)

Here is the final output.

Stabilize Facial Landmarks In A Video Using Dlib

The final video shows that despite movement of the lips and facial contortions, the points are stable and are adjusting according to the movement. You can check this video here. 


In this article, we have learned the step-by-step process to stabilize the important landmark for a face in a video. In order to improve the accuracy and precision of face detection or recognition systems, the process of stabilization is very important. 

More Great AIM Stories

Bhoomika Madhukar
I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM