Google, along with the University of California at Berkeley, has recently published a paper that makes a claim that best forensic classifiers (trained AI that distinguishes between real and synthetic) are prone to adversarial attacks. This follows a past work of researchers from the University of California at San Diego, who proved that it is possible to bypass a fake video detector by simply injecting information into each frame and synthesizing videos using existing AI generation methods.
Deepfake has become a global concern in recent times since it can be used to influence people’s opinion during election campaigns, implicate a person in a crime, and even initiate fraudulent activities.
The team of researchers began by evaluating the classifier to which they had unrestricted access. With the use of a white-box threat model and a data set containing 94,036 images, the team reconstructed synthesized images to misclassify them as real and vice versa. The team enabled several attacks, such as a distortion-minimizing attack, a universal adversarial-patch attack, and a universal latent-space attack to the classifier.
The distortion-minimizing had minor modifications to the synthetically-generated image that resulted in the classifier misclassifying 71.3% of the images with only 2% pixel changes and 89.7% images with 4% pixel changes. What caught the researchers’ attention was how the models classified 50% of the real images as fake after the team distorted under 7% of the image’s pixels.
Moving on to the loss-minimizing attack for which the image distortion was set below a certain threshold, the classifier’s accuracy reduced from 96.6% to 27%. The universal-adversarial attack was far more beneficial as a visibly noisy pattern was overlaid on two fake images that led the model to classify the image as real with an accuracy of 98% and 86%. The universal latent-space attack had an underlying representation – leveraged by an image-generating model – to reduce the classification from 99% to 17%.
The researchers closely looked at a black-box attack with mysterious inner workings of the target classifier. The team created their classifier after collecting one million images synthesized by an AI model, and one million real images on which the model was trained. The team prepared a separate system to distinguish between the real and fake images. It generated a white-box adversarial example on the primary classifier with the help of a distortion-minimizing attack. As per the team, the steps mentioned above reduced the accuracy of the classifier from 85% to 0.03%. The accuracy further reduced from 96% to 22% when the action was added on a popular third-party classifier.
The team of researchers wrote, “To the extent that synthesized or manipulated content is used for nefarious purposes, the problem of detecting this content is inherently adversarial. We argue, therefore, that forensic classifiers need to build an adversarial model into their defences.” The team further explained that attacks on sensitive systems should never be taken lightly. It is best that these forensic classifiers are not deployed since they provide a sense of false security. It will make fake profile pictures more convincing with the help of these classifiers, even though they are defeated using a committed adversary.
Several organizations such as Facebook along with Amazon Web Services are leading the Deepfake Challenge, which includes a data set of video samples which were manipulated by artificial intelligence. Google too released several visual deepfakes as part of the FaceForensics criterion that was created with the assistance of the Technical University of Munich and the University Federico II of Naples. The most recent example is the creation of DeeperForensics-1.0, a data set that can detect forged faces by SenseTime and the Nanyang Technological University of Singapore.