Facebook has introduced a first of its kind self-supervised AI model TextStyleBrush to copy the style of a text in a photo using a single word. The model allows you to edit and replace text in images of both scenes and handwriting — in one go — using a single example word.
The handwriting dataset used for conducting this experiment is available on GitHub.
In a research paper ‘TextStyleBrush: Transfer of text aesthetics from a single example,’ co-authored by Facebook AI researchers Praveen Krishnan, Rama Kovvuri, Guan Pang, Boris Vassilev and Tal Hassner, showed a novel approach for disentangling the content of a text image from all aspects of its appearance.
Praveen Krishnan, a postdoctoral researcher at Facebook AI, said building a machine learning model that’s flexible enough to understand the nuances of both text in real-world scenes and handwriting is challenging compared to well-defined, specialised tasks.
The challenges include understanding text styles for different typography and calligraphy, alongside various transformations like rotations, curved text, and deformations between paper and pen when handwriting; the background clutter; and the image noise. Such complexities pose a hurdle to segment text from its background. Moreover, it’s not easy to create annotated/labelled examples for every possible appearance for the entire alphabet, as well as digits.
This is where Facebook’s TextStyleBrush AI model comes in handy. It works similar to how style brush tools work in word processors. The researchers said the tool surpassed SOTA accuracy in automated tests and user studies for any text format.
Facebook has used style and content encoders to learn conditional representation to generate the target text style images. “Our framework is trained using multiple losses, which involve a style loss computed using a typeface classifier, a content loss, which uses a pre-trained OCR model; adversarial loss, to add realism; and finally, reconstruction losses to aid self-supervision,” said Krishnan.
The stylized text generator architecture is based on the StyleGAN2 model. However, StyleGAN2 has two limitations for generating photo-realistic text images: It’s an unconditional model; and stylized text images are unique in nature.
“We address these limitations together by conditioning the generator on our style and content representations. We handle the multiscale nature of text style by extracting layer-specific style information and injecting it at each layer of the generator,” said the researchers.
In addition to generating the target image in the desired style, it also generates a soft mask image that shows the foreground pixels. This way, it controls both low-and high-resolution details of the text appearance to match the desired input style, explained Krishnan.
How does it work?
Typically, transferring text styles involves training a model with supervised data in terms of source and target content in similar styles and explicit segmentation of text. However, building an efficient text segmentation method for real world images is not an easy task.
For instance, the line in handwriting is often one pixel wide or even less. Also, collecting good training data for segmentation involves the added complexity of labeling both foreground and background.
Facebook TextStyleBrush, on the other hand, trains the model on real-world images directly using a self-supervised technique. “We do not assume any form of supervision available on how styles are represented or the availability of segmented text labels. Also, we do not assume that the source style example and new content style, we extract an opaque latent style representation, and we optimise our representation to allow photo-realistic rendering of new content using a single source sample,” said the researchers.
“As the ongoing self-supervised revolution continues to grow, we see it as imperative that the artificial intelligence field openly facilitate research into detecting misuse. This includes going beyond fake faces to text and sharing benchmark data sets,” said Krishnan.
Currently, the technology is still in the research phase. The team believes its technology can power a wide variety of useful applications in the future, including translating text in images to different languages, creating personalised messaging and captions, and maybe one-day facilitating real-world translation of street signs using augmented reality (AR).
Join Our Telegram Group. Be part of an engaging online community. Join Here.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Amit Raja Naik is a senior writer at Analytics India Magazine, where he dives deep into the latest technology innovations. He is also a professional bass player.