New Research Is Making Deepfake Speech Even More Real & Terrifying

Deepfake is a controversial topic from the very beginning. In one of our previous articles, we discussed how Deepfake is being carried out around the globe. It has already started to change a lot of things around including the television and film sector.

Recently, a group of researchers from the Max Planck Institute for Informatics, Stanford University, Princeton University, and Adobe Research created an exceptional algorithm which can make flawless edits on talking-head-videos by changing the speech content.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

How The Model Works

This novel model specifically focuses on the face and upper body of a speaker and is based on text-edits and works as transcript-based editing of the talking-head video. When the transcription is edited, the algorithm selects segments from various parts of the video with a similar motion which can be joined to create the newly edited video. 

Download our Mobile App

The working of the model is mentioned below:

  • Phoneme Alignment: The researchers firstly align the transcript of the speech to a talking-head video at the level of Phonemes (Phonemes are perceptually distinct units that distinguish one word from another in a specific language). This method helps in searching snippets in the video which can be later combined to create new content.
  • 3D Face Tracking and Reconstruction: A 3D parametric face model is registered with each frame of the input talking-head video which will later help to selectively blend different aspects of the face.
  • Viseme Search: Given an edit operation, the model performs a viseme search (Visemes are the groups of aurally distinct phonemes that appear visually similar to one another) in order to find the best match between the subsequences of the phonemes in the video. 
  • Parameter Retiming & Blending: The parametric face model is used in order to mix different properties of a face such as a pose, expressions, etc. from different input frames and then blend them together in parameter space. 
  • Neural Face Rendering: A neural face rendering approach is implied in order to synthesize photo-realistic talking-head video which matches the modified parameter sequence and thus creating a photo-realistic talking-head video frame.

Applications Of The Model

The researchers mainly focused to use this model for video editing and translation in the production of movies, TV shows, commercials, YouTube video logs, and online lectures as a better editing tool.

Currently, the model supports three kinds of edit operations as mentioned below

  • Add New Words: In this type, one or more consecutive words can be added at a particular point of a video. 
  • Rearrange existing words: In this type, the edit works by moving one or more consecutive words that exist in the video. 
  • Delete existing words: In this type, the edit works by removing one or more consecutive words from the video.

The Other Perspective

As the advancement of technology has huge and immense advantages, however, there are some people who will never stop while utilising it for bad means. The researchers raised important and valid concerns about the probability for misusing the test-based editing approach such as utilising this technology to falsify personal allegations and scandal famous individuals.

One of the researchers from Stanford University stated that every advanced technology will undoubtedly attract people with negative thoughts. For these reasons, the researchers propose some guidelines such as developing forensics, biometrics and other verification methods to diagnose the manipulated videos by the viewers.


Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Council Post: The Rise of Generative AI and Living Content

In this era of content, the use of technology, such as AI and data analytics, is becoming increasingly important as it can help content creators personalise their content, improve its quality, and reach their target audience with greater efficacy. AI writing has arrived and is here to stay. Once we overcome the initial need to cling to our conventional methods, we can begin to be more receptive to the tremendous opportunities that these technologies present.