Early last year, San Francisco-based artificial intelligence company OpenAI launched an AI system that could generate a realistic image from the description of the scene or object and called it DALL.E. The text-to-image generator’s name was a portmanteau coined after combining the artist Salvador Dali and the robot WALL.E from the Pixar film of the same name. OpenAI then described DALL.E as the “GPT-3 for images.” Like GPT-3, DALL.E is also a transformer language model. Despite the fact that the model was seemingly making images from nothing, the results produced weren’t exactly frame worthy.
Last week, OpenAI announced a more advanced version called DALL.E 2. The reaction to the images churned out by the model caused a mini commotion on Twitter. CEO Sam Altman started inviting random commenters to suggest the most inventive ideas they could think of for the model to make images out of. DALL.E 2 more than obliged by creating imagery out of thin air.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
DALL.E was a 12-billion parameter model that worked using a dataset of text-image pairs. As an input, it received both the image and the text as a single stream of data. Each stream of data contained up to 1280 tokens and was trained using maximum likelihood to generate all of the tokens, one after another. This enabled DALL.E to produce a fresh image from scratch. It could also regenerate any rectangular region of an existing image that extended to the bottom-right corner, such that it was consistent with the text prompts.
DALL.E 2 essentially does the same thing that DALL.E does, which is to take a complex prompt like, “A painting inspired by Banksy’s art showing a machine-human interaction,” and then turn it into hundreds of images. Eventually, it chooses the most suitable image from all the outputs to one that would meet the user’s standards. However, DALL.E 2 is far more versatile and capable of producing images of a higher resolution.
More efficient: DALL.E 2 functions on a 3.5-billion parameter model while using another 1.5-billion parameter model to enhance the resolution of its digitally-produced images. The model is also faster at processing images than DALL.E. The big jump in performance is because of a new diffusion model, which is smaller and more efficient than the one that DALL.E used. The diffusion model starts with an image that is entirely noise and then gradually transforms itself to make it look as close to the prompt.
Source: OpenAI blog
More realistic: The images produced by the newer version are more well-rounded, with complex backgrounds, realistic lighting and reflections. The final product is a far cry from the images that DALL.E produced that were cartoonish and, more often than not, had a plain background.
Source: OpenAI blog
Editing: Another major addition to DALL.E 2 is that it can edit an image using what it calls, “inpainting.” A user can input the prompt asking for the change that it wants to make and select the area on the image that it wants to edit. In a few seconds, the model produces a handful of options for the user to choose from. For example, the user could select an area on a table that carries plates and prompt for them to be removed if they want. The model is also able to render appropriate lighting and shadows in images besides also using the most suited materials for the objects.
Source: OpenAI blog
Multiple variations: DALL.E 2 is also able to produce multiple variations of a single image. These variations could be an impressionistic version of the image or a close resemblance of it. The user can even give the model a second image and DALL.E 2 can combine the more vital features of both the images to form a final one.
According to tests conducted by OpenAI, DALL.E 2’s image classification and captions are more accurate. In the past year, it was found that algorithms were more vulnerable to being tricked into mislabelling an item. For example, if the system was trained using the image of an apple that was labelled ‘orange,’ the system would get tricked into believing that it was an orange. However, DALL.E 2 does not make the same mistake.
OpenAI has said it is conscious of the potential negative impact that DALL.E 2 could have in the wrong hands. In today’s world of deep fakes, the model could easily be used to produce misinformation or racist imagery, which is why OpenAI has allowed DALL.2 to be used by developers on solely an invite-only basis. All the prompts that the model receives must adhere to a strict content policy. To completely rule out the possibility of DALL.E 2 producing any hateful or violent images, the dataset itself omitted the inclusion of any dangerous weapons. While OpenAI has said that it intends to turn it into an API eventually, it is prepared to proceed with caution in the case of DALL.E 2.