Now Reading
CLIPDraw, A New Algorithm That Synthesises Drawings From Text

CLIPDraw, A New Algorithm That Synthesises Drawings From Text

  • I do believe biasing towards drawings rather than photorealism gives images more freedom of expression

Researchers from Cross Labs, MIT, Earth-Life Science Institute, and College of Arts and Sciences recently introduced CLIPDraw, an algorithm to synthesise drawings based on natural language input. The code is available on Colab Notebook.

“The field of ‘text-to-image synthesis’ has a broad history, and current methods have shown stunningly realistic image generation through GAN-like methods. Realism, however, is a double-edged sword — there is a lot of overhead in generating photorealistic renderings, which often all we want are simple drawings,” said Kevin Frans, a researcher at MIT.

Register for FREE Workshop on Data Engineering>>

CLIPDraw is inspired by a web drawing and guessing game Skribbl.io.

How does CLIPDraw work?

CLIPDraw is powered by a pre-trained CLIP model (developed by OpenAI). The CLIP model consists of an image encoder and a text encoder that map onto the same representational space, allowing to measure the similarities between images and text. “And if we can measure similarities, we can also try to discover images that maximize that similarity, therefore matching a given textual prompt,” Frans said.

The basic CLIPDraw loop follows this principle of synthesis-through-optimisation. First, it starts with a human-given description prompt and a random set of Bézier curves. Then, it slowly adjusts the curves through gradient descent so that the drawing best matches the given prompt. Bézier curve is a ‘parametric curve‘ used in computer graphics.

(Source: kvfrans.com

CLIPDraw does not require any training; instead, a pre-trained CLIP language image encoder is used as a metric for maximising the similarity between the given description and a generated drawing. Most importantly, CLIPDraw operates over vector strokes rather than pixel images.

(Source: arXiv)

Experiments

Currently, CLIPDraw produces a diverse set of human-recognisable drawings based on simple strokes and shapes. “A great example for this is ‘a painting of a starry night sky,’ which shows a painterly-styled sky with a ‘moon and stars,’ alongside an actual painting canvas and painter in the foreground, which then also features ‘black and blue swirls’ resembling Van Gogh’s ‘The Starry Night,” said Frans. 

“At times, the drawings contain symbols that do not literally contain the description, but are tangentially associated, such as the prompt “自転車” (bicycle in Japanese) resembling a Google Maps screenshot with a Japanese-like character in the corner. The ambiguity of prompts also presents intriguing results. In the prompt “Fast Food”, a McDonald’s logo along with a set of hamburgers is shown,” the researchers said.

(Source: arXiv)

See Also
Top 10 Tools To Kickstart Your MLOps Journey In 2021

How is CLIPDraw different from other methods? 

Compared to methods that learn a direct generative model, optimisation-based synthesis methods like CLIPDraw do not require prior training. Instead, images are generated through an evaluation-time optimisation loop, aiming to maximise a given objective. This work focuses explicitly on synthesising images that match the CLIP encoding of a description prompt. 

Key differences:

  • CLIPDraw drawings are produced by a set of RGBA Bézier curves. The control points, thickness, and colours of the curves can all be adjusted. 
  • Pixel Optimisation, on the other hand, optimises a 224x224x3 matrix of RGB pixel colours. Otherwise, all algorithmic aspects are the same as CLIPDraw, including image augmentation. 
  • BigGAN optimisation, in which images are produced using a pre-trained BigGAN generator. The weights of the generator are frozen; only the latent Z vectors are optimised.
  • CLIPDraw (no augment) is identical to CLIPDraw, except no image augmentation is applied to the synthesised drawings.

The images of various synthesis-through-optimisation methods that match a given CLIP-encode description phrase are shown below. 

(Source: arXiv)

Wrapping up

CLIPDraw algorithm is not entirely new; people have been doing synthesis-through-optimisation for a while through activation-maximisation methods, and recently through ‘CLIP-matching objectives’. “I do believe biasing towards drawings rather than photorealism gives images more freedom of expression, and optimising Bézier curves is a nice way to do this efficiently,” said Frans, “I also personally love this art style, and I think the drawings are quite similar to what an artist would produce.” 

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top