Active Hackathon

Tech Behind Facebook’s TextStyleBrush

Facebook has introduced a first of its kind self-supervised AI model TextStyleBrush to copy the style of a text in a photo using a single word. The model allows you to edit and replace text in images of both scenes and handwriting — in one go — using a single example word. 

The handwriting dataset used for conducting this experiment is available on GitHub.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.
(Source: Facebook)

In a research paper ‘TextStyleBrush: Transfer of text aesthetics from a single example,’ co-authored by Facebook AI researchers Praveen Krishnan, Rama Kovvuri, Guan Pang, Boris Vassilev and Tal Hassner, showed a novel approach for disentangling the content of a text image from all aspects of its appearance. 

Praveen Krishnan, a postdoctoral researcher at Facebook AI, said building a machine learning model that’s flexible enough to understand the nuances of both text in real-world scenes and handwriting is challenging compared to well-defined, specialised tasks.

The challenges include understanding text styles for different typography and calligraphy, alongside various transformations like rotations, curved text, and deformations between paper and pen when handwriting; the background clutter; and the image noise. Such complexities pose a hurdle to segment text from its background. Moreover, it’s not easy to create annotated/labelled examples for every possible appearance for the entire alphabet, as well as digits.

This is where Facebook’s TextStyleBrush AI model comes in handy. It works similar to how style brush tools work in word processors. The researchers said the tool surpassed SOTA accuracy in automated tests and user studies for any text format. 

(Source: Facebook)

TextStyleBrush framework

Facebook has used style and content encoders to learn conditional representation to generate the target text style images. “Our framework is trained using multiple losses, which involve a style loss computed using a typeface classifier, a content loss, which uses a pre-trained OCR model; adversarial loss, to add realism; and finally, reconstruction losses to aid self-supervision,” said Krishnan. 

(Source: Facebook)

The stylized text generator architecture is based on the StyleGAN2 model. However, StyleGAN2 has two limitations for generating photo-realistic text images: It’s an unconditional model; and stylized text images are unique in nature.

“We address these limitations together by conditioning the generator on our style and content representations. We handle the multiscale nature of text style by extracting layer-specific style information and injecting it at each layer of the generator,” said the researchers. 

In addition to generating the target image in the desired style, it also generates a soft mask image that shows the foreground pixels. This way, it controls both low-and high-resolution details of the text appearance to match the desired input style, explained Krishnan.

Stylized text generator (Source: Facebook) 

How does it work? 

Typically, transferring text styles involves training a model with supervised data in terms of source and target content in similar styles and explicit segmentation of text. However, building an efficient text segmentation method for real world images is not an easy task.

For instance, the line in handwriting is often one pixel wide or even less. Also, collecting good training data for segmentation involves the added complexity of labeling both foreground and background. 

Facebook TextStyleBrush, on the other hand, trains the model on real-world images directly using a self-supervised technique. “We do not assume any form of supervision available on how styles are represented or the availability of segmented text labels. Also, we do not assume that the source style example and new content style, we extract an opaque latent style representation, and we optimise our representation to allow photo-realistic rendering of new content using a single source sample,” said the researchers. 

How TextStyleBrush works (Source: Facebook)

Wrapping up 

“As the ongoing self-supervised revolution continues to grow, we see it as imperative that the artificial intelligence field openly facilitate research into detecting misuse. This includes going beyond fake faces to text and sharing benchmark data sets,” said Krishnan.

Currently, the technology is still in the research phase. The team believes its technology can power a wide variety of useful applications in the future, including translating text in images to different languages, creating personalised messaging and captions, and maybe one-day facilitating real-world translation of street signs using augmented reality (AR). 

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM