The Good, The Bad And The AugLy: Behind Facebook’s Newly Open-Sourced Library

AugLy combines modalities like audio, video, image and text, to help algorithms better understand and deal with complex content.
AugLy

If there is anything to be learnt from machine learning (ML), it is that data is critical. However, when developers do not have data to build their machine learning models, augmentation comes to the rescue. Data Augmentation is the practice of creating new, synthetic data from the already available data. The technique can be applied to any form of data, and the result is similar to the actual data available. 

In a recent blog post, Facebook’s AI Research team announced open-sourcing a new Python library, AugLy, developed at Facebook’s Seattle and Paris offices. The social media giant will provide sophisticated data augmentation tools to AI researchers to evaluate and build their ML models. 

The library offers more than 100 data augmentations that focus on modifications done to images and videos on various social media platforms, like Facebook and Instagram. It includes features such as cropping, overlaying meme-style text, emoji and screenshot transformations. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Source: Facebook AI

The pictures have fairly innocuous images and overlaid text—individually. The text is such that it would be considered a compliment if it were presented on its own. However, the memes above present the images and text together, thus generating context that individuals will understand as unfriendly, but a machine might not. To combat this, AugLy combines different modalities such as audio, video, image and text, which helps algorithms better understand and deal with complex content. 

As per Facebook, many of the augmentations in AugLy are informed in ways in which users have earlier tried to evade the social media giant’s automatic systems. Thus, making AugLy specifically useful for models and data related to social media applications. 

How does this work?

Source: Facebook AI

For this project, Facebook aggregated multiple augmentations from many libraries—some of which Facebook wrote for this purpose itself. One of Facebook’s augmentations takes images or videos and overlays them on a social media interface to make it seem like the image or video in question was reshared by a user after being screenshotted. Given how commonplace it is to take screenshots and share such media across apps such as Instagram or Facebook, working on libraries like these help AI systems understand that the content is still the same regardless of any distracting interface elements.

AugLy comprises four sub-libraries, each of which corresponds to a different modality. For each library, Facebook provides transforms in function-based and class-based formats. AugLy also uses intensity functions that help users understand the intensity of any transformations based on given parameters. Finally, AugLy can also create metadata to help understand how one may have transformed the data.

Source: Facebook AI

Data Augmentations are necessary to maintain the strength of AI models. Teaching models to be resilient to unimportant data attributes will allow them to focus on more essential data characteristics for a particular use case. 

For instance, Facebook used AI to detect COVID-19 misinformation and exploitative content using a neural net-based model, SimSearchNet. Facebook AI built the model specifically to detect near-exact duplicates and was trained using AugLy data augmentations. Such models allow Facebook to see such misinformation even if it reappears in slightly different forms, such as using a slightly modified image or a new filter or overlaid text. 

Facebook also used the AugLy library to test the strength of other models on a set of augmentations. In 2019, Facebook held a Deepfake Detection Challenge designed to look at progress in deepfake detection technology. For this, Facebook created and shared a dataset containing more than 100,000 videos and had experts come in and benchmark their deepfake detection models against them. AugLy was employed to evaluate the robustness of these deepfake detection models in the challenge and influenced the choice for the top five winners.

Summing up 

Libraries such as AugLy encourage the development of machine learning models and ensure their robustness by gearing them up towards understanding how human beings interact on social media. It is, of course, vital to realise that more problems will arise in the realm of negative uses of data augmentation such as hateful deepfake images—especially considering the rate at which such technologies advance. Still, by open-sourcing AugLy, Facebook has opened up more doors to any developers working on finding solutions to the problem of misinformation or hateful content on social media. 

Mita Chaturvedi
I am an economics undergrad who loves drinking coffee and writing about technology and finance. I like to play the ukulele and watch old movies when I'm free.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR