Picture this — you receive a video of a politician’s before-election speech from a friend. You see that the politician in question is saying something outrageous. The sheer ridiculousness of his speech makes you forward it in your own circle, only to realise much later that the video was manipulated. But by now, the video has been circulated wide and far, with almost no way to recall it back. Such hyper-realistic manipulated videos are called deepfakes. These videos leverage powerful techniques from machine learning and artificial intelligence to generate highly deceptive visual and audio content.
The repercussions of such deepfakes have been dangerous with compromised videos of public figures in circulation which threaten their reputation. Worse, it is anticipated that deepfakes may even play a large role in swaying elections of countries. Notably, Facebook, Twitter, and TikTok have already banned such deepfake content on their platform.
With such major consequences of deepfakes, there have been several efforts to create tools, albeit with varying degrees of success, which can help detect them. Here, we discuss a few of them:
Microsoft’s Video Authenticator Tool
Launched in September this year ahead of the US elections 2020, this video authenticator tool is created by Microsoft. It can analyse a still photo or video to provide a confidence score to detect whether or not the media is manipulated. It detects the blending boundary of the deepfake and subtle grayscale elements that are undetectable to the human eye. It also provides this confidence score in real-time.
The tool has been created with a public dataset from Face Forensics++ and has been tested using the Deepfake Detection Challenge Dataset. Both these datasets are leading technologies for training and testing deepfake detection technologies.
Additionally, the tech giant also introduced a new technology that can find doctored content and assure readers of its authenticity. It contains two parts – the first component is integrated into Microsoft Azure which allows the content creator to digital hashes and certificates which remains in the part of its metadata, and the second component is which helps the reader verifies and matches these certificates and hashes to check authenticity.
Deepfake Detection Using Biological Signals
The researchers from Binghamton University and Intel created a tool that goes beyond just deepfake detection and recognises the deepfake model behind the compromised video. This tool looks for unique biological and generative noise signals ‘deepfake heartbeats’ left by deepfake model videos. These signals are detected from 32 different spots in a person’s face, called the Photoplethysmography (PPG) cells.
The architecture of this model is based on convolutional neural networks with VGG blocks. It uses OpenFace library in Python for face detection, OpenCV for image processing, and Keras for neural network implementations. Like Video Authenticator tool from Microsoft, the learning setting of FaceCatcher is based on the FaceForensics++ (FF) dataset. As per the researchers, this tool has 97.29% accuracy for fake video detection.
Deepfake Detection Using Phoneme-Viseme Mismatches
This model is invented by researchers from Stanford University and the University of California. This technique exploits the fact that visemes, which denote the dynamics of the mouth shape, are sometimes different or inconsistent with the spoken phoneme. For example, for saying words such as mama, baba, and papa, there might be a phoneme-viseme mismatch, which can be used in detecting even spatially small and temporally localised manipulations in deepfake videos. The team used lipsync deep fakes created using three synthesis techniques – Audio-to-Video (A2V), Text-to-Video for short utterances (T2V-S), and Text-to-Video for longer utterances (T2V-L).
The technique was used for both manual video authentication and automatic approaches. For A2V, T2V-S, and T2V-L, this model showed an accuracy of 96.0%, 97.8%, and 97.4% for manual authentication, and 93.4%, 97.0%, and 92.8% for automatic authentication, respectively.
Forensic Technique Using Facial Movements
This model tracks facial expression and movements of a single video provided as input. It extracts the presence and strength of specific action units. This detection model has a one-class support vector model (SVM) which distinguishes an individual from others as well as comedic impersonators from deepfake impersonators.
This model utilises OpenFace, an open-source facial behaviour analysis toolkit to extract facial and head movements in a video. The library provides 2D and 3D facial landmark positions, head pose, and facial action units for each frame.
Recurrent Convolutional Strategy
The Recurrent Convolutional Strategy uses recurrent convolutional models (RCM) for detecting face manipulation in videos. RCM are a class of deep learning models which effectively exploit temporal information from image streams across domains. This technique can detect deepfake, Face2Face, and FaceSwap tampered faces in video streams. It was tested on the FaceForensics++ dataset and showed an accuracy of up to 97%, a 4.55% improvement from its preceding techniques.
The technique for creating deepfakes keeps developing and advancing. The tools and techniques discussed in this listicle do not present complete accuracy and effectiveness. However, they are making strides in the right direction for a battle that is much bigger.