If there is anything to be learnt from machine learning (ML), it is that data is critical. However, when developers do not have data to build their machine learning models, augmentation comes to the rescue. Data Augmentation is the practice of creating new, synthetic data from the already available data. The technique can be applied to any form of data, and the result is similar to the actual data available.
In a recent blog post, Facebook’s AI Research team announced open-sourcing a new Python library, AugLy, developed at Facebook’s Seattle and Paris offices. The social media giant will provide sophisticated data augmentation tools to AI researchers to evaluate and build their ML models.
The library offers more than 100 data augmentations that focus on modifications done to images and videos on various social media platforms, like Facebook and Instagram. It includes features such as cropping, overlaying meme-style text, emoji and screenshot transformations.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Source: Facebook AI
The pictures have fairly innocuous images and overlaid text—individually. The text is such that it would be considered a compliment if it were presented on its own. However, the memes above present the images and text together, thus generating context that individuals will understand as unfriendly, but a machine might not. To combat this, AugLy combines different modalities such as audio, video, image and text, which helps algorithms better understand and deal with complex content.
As per Facebook, many of the augmentations in AugLy are informed in ways in which users have earlier tried to evade the social media giant’s automatic systems. Thus, making AugLy specifically useful for models and data related to social media applications.
How does this work?
Source: Facebook AI
For this project, Facebook aggregated multiple augmentations from many libraries—some of which Facebook wrote for this purpose itself. One of Facebook’s augmentations takes images or videos and overlays them on a social media interface to make it seem like the image or video in question was reshared by a user after being screenshotted. Given how commonplace it is to take screenshots and share such media across apps such as Instagram or Facebook, working on libraries like these help AI systems understand that the content is still the same regardless of any distracting interface elements.
AugLy comprises four sub-libraries, each of which corresponds to a different modality. For each library, Facebook provides transforms in function-based and class-based formats. AugLy also uses intensity functions that help users understand the intensity of any transformations based on given parameters. Finally, AugLy can also create metadata to help understand how one may have transformed the data.
Source: Facebook AI
Data Augmentations are necessary to maintain the strength of AI models. Teaching models to be resilient to unimportant data attributes will allow them to focus on more essential data characteristics for a particular use case.
For instance, Facebook used AI to detect COVID-19 misinformation and exploitative content using a neural net-based model, SimSearchNet. Facebook AI built the model specifically to detect near-exact duplicates and was trained using AugLy data augmentations. Such models allow Facebook to see such misinformation even if it reappears in slightly different forms, such as using a slightly modified image or a new filter or overlaid text.
Facebook also used the AugLy library to test the strength of other models on a set of augmentations. In 2019, Facebook held a Deepfake Detection Challenge designed to look at progress in deepfake detection technology. For this, Facebook created and shared a dataset containing more than 100,000 videos and had experts come in and benchmark their deepfake detection models against them. AugLy was employed to evaluate the robustness of these deepfake detection models in the challenge and influenced the choice for the top five winners.
Libraries such as AugLy encourage the development of machine learning models and ensure their robustness by gearing them up towards understanding how human beings interact on social media. It is, of course, vital to realise that more problems will arise in the realm of negative uses of data augmentation such as hateful deepfake images—especially considering the rate at which such technologies advance. Still, by open-sourcing AugLy, Facebook has opened up more doors to any developers working on finding solutions to the problem of misinformation or hateful content on social media.