The Good, The Bad And The AugLy: Behind Facebook’s Newly Open-Sourced Library

AugLy combines modalities like audio, video, image and text, to help algorithms better understand and deal with complex content.
AugLy

If there is anything to be learnt from machine learning (ML), it is that data is critical. However, when developers do not have data to build their machine learning models, augmentation comes to the rescue. Data Augmentation is the practice of creating new, synthetic data from the already available data. The technique can be applied to any form of data, and the result is similar to the actual data available. 

In a recent blog post, Facebook’s AI Research team announced open-sourcing a new Python library, AugLy, developed at Facebook’s Seattle and Paris offices. The social media giant will provide sophisticated data augmentation tools to AI researchers to evaluate and build their ML models. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

The library offers more than 100 data augmentations that focus on modifications done to images and videos on various social media platforms, like Facebook and Instagram. It includes features such as cropping, overlaying meme-style text, emoji and screenshot transformations. 

Source: Facebook AI

The pictures have fairly innocuous images and overlaid text—individually. The text is such that it would be considered a compliment if it were presented on its own. However, the memes above present the images and text together, thus generating context that individuals will understand as unfriendly, but a machine might not. To combat this, AugLy combines different modalities such as audio, video, image and text, which helps algorithms better understand and deal with complex content. 

As per Facebook, many of the augmentations in AugLy are informed in ways in which users have earlier tried to evade the social media giant’s automatic systems. Thus, making AugLy specifically useful for models and data related to social media applications. 

How does this work?

Source: Facebook AI

For this project, Facebook aggregated multiple augmentations from many libraries—some of which Facebook wrote for this purpose itself. One of Facebook’s augmentations takes images or videos and overlays them on a social media interface to make it seem like the image or video in question was reshared by a user after being screenshotted. Given how commonplace it is to take screenshots and share such media across apps such as Instagram or Facebook, working on libraries like these help AI systems understand that the content is still the same regardless of any distracting interface elements.

AugLy comprises four sub-libraries, each of which corresponds to a different modality. For each library, Facebook provides transforms in function-based and class-based formats. AugLy also uses intensity functions that help users understand the intensity of any transformations based on given parameters. Finally, AugLy can also create metadata to help understand how one may have transformed the data.

Source: Facebook AI

Data Augmentations are necessary to maintain the strength of AI models. Teaching models to be resilient to unimportant data attributes will allow them to focus on more essential data characteristics for a particular use case. 

For instance, Facebook used AI to detect COVID-19 misinformation and exploitative content using a neural net-based model, SimSearchNet. Facebook AI built the model specifically to detect near-exact duplicates and was trained using AugLy data augmentations. Such models allow Facebook to see such misinformation even if it reappears in slightly different forms, such as using a slightly modified image or a new filter or overlaid text. 

Facebook also used the AugLy library to test the strength of other models on a set of augmentations. In 2019, Facebook held a Deepfake Detection Challenge designed to look at progress in deepfake detection technology. For this, Facebook created and shared a dataset containing more than 100,000 videos and had experts come in and benchmark their deepfake detection models against them. AugLy was employed to evaluate the robustness of these deepfake detection models in the challenge and influenced the choice for the top five winners.

Summing up 

Libraries such as AugLy encourage the development of machine learning models and ensure their robustness by gearing them up towards understanding how human beings interact on social media. It is, of course, vital to realise that more problems will arise in the realm of negative uses of data augmentation such as hateful deepfake images—especially considering the rate at which such technologies advance. Still, by open-sourcing AugLy, Facebook has opened up more doors to any developers working on finding solutions to the problem of misinformation or hateful content on social media. 

More Great AIM Stories

Mita Chaturvedi
I am an economics undergrad who loves drinking coffee and writing about technology and finance. I like to play the ukulele and watch old movies when I'm free.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.