Researchers from Cross Labs, MIT, Earth-Life Science Institute, and College of Arts and Sciences recently introduced CLIPDraw, an algorithm to synthesise drawings based on natural language input. The code is available on Colab Notebook.
“The field of ‘text-to-image synthesis’ has a broad history, and current methods have shown stunningly realistic image generation through GAN-like methods. Realism, however, is a double-edged sword — there is a lot of overhead in generating photorealistic renderings, which often all we want are simple drawings,” said Kevin Frans, a researcher at MIT.
CLIPDraw is inspired by a web drawing and guessing game Skribbl.io.
How does CLIPDraw work?
CLIPDraw is powered by a pre-trained CLIP model (developed by OpenAI). The CLIP model consists of an image encoder and a text encoder that map onto the same representational space, allowing to measure the similarities between images and text. “And if we can measure similarities, we can also try to discover images that maximize that similarity, therefore matching a given textual prompt,” Frans said.
The basic CLIPDraw loop follows this principle of synthesis-through-optimisation. First, it starts with a human-given description prompt and a random set of Bézier curves. Then, it slowly adjusts the curves through gradient descent so that the drawing best matches the given prompt. Bézier curve is a ‘parametric curve‘ used in computer graphics.
CLIPDraw does not require any training; instead, a pre-trained CLIP language image encoder is used as a metric for maximising the similarity between the given description and a generated drawing. Most importantly, CLIPDraw operates over vector strokes rather than pixel images.
Currently, CLIPDraw produces a diverse set of human-recognisable drawings based on simple strokes and shapes. “A great example for this is ‘a painting of a starry night sky,’ which shows a painterly-styled sky with a ‘moon and stars,’ alongside an actual painting canvas and painter in the foreground, which then also features ‘black and blue swirls’ resembling Van Gogh’s ‘The Starry Night,” said Frans.
“At times, the drawings contain symbols that do not literally contain the description, but are tangentially associated, such as the prompt “自転車” (bicycle in Japanese) resembling a Google Maps screenshot with a Japanese-like character in the corner. The ambiguity of prompts also presents intriguing results. In the prompt “Fast Food”, a McDonald’s logo along with a set of hamburgers is shown,” the researchers said.
How is CLIPDraw different from other methods?
Compared to methods that learn a direct generative model, optimisation-based synthesis methods like CLIPDraw do not require prior training. Instead, images are generated through an evaluation-time optimisation loop, aiming to maximise a given objective. This work focuses explicitly on synthesising images that match the CLIP encoding of a description prompt.
- CLIPDraw drawings are produced by a set of RGBA Bézier curves. The control points, thickness, and colours of the curves can all be adjusted.
- Pixel Optimisation, on the other hand, optimises a 224x224x3 matrix of RGB pixel colours. Otherwise, all algorithmic aspects are the same as CLIPDraw, including image augmentation.
- BigGAN optimisation, in which images are produced using a pre-trained BigGAN generator. The weights of the generator are frozen; only the latent Z vectors are optimised.
- CLIPDraw (no augment) is identical to CLIPDraw, except no image augmentation is applied to the synthesised drawings.
The images of various synthesis-through-optimisation methods that match a given CLIP-encode description phrase are shown below.
CLIPDraw algorithm is not entirely new; people have been doing synthesis-through-optimisation for a while through activation-maximisation methods, and recently through ‘CLIP-matching objectives’. “I do believe biasing towards drawings rather than photorealism gives images more freedom of expression, and optimising Bézier curves is a nice way to do this efficiently,” said Frans, “I also personally love this art style, and I think the drawings are quite similar to what an artist would produce.”
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Amit Raja Naik is a senior writer at Analytics India Magazine, where he dives deep into the latest technology innovations. He is also a professional bass player.