MITB Banner

What’s the big deal about DALL.E 2?

DALL.E 2 is far more versatile and capable of churning out images of a higher resolution.

Share

Early last year, San Francisco-based artificial intelligence company OpenAI launched an AI system that could generate a realistic image from the description of the scene or object and called it DALL.E. The text-to-image generator’s name was a portmanteau coined after combining the artist Salvador Dali and the robot WALL.E from the Pixar film of the same name. OpenAI then described DALL.E as the “GPT-3 for images.” Like GPT-3, DALL.E is also a transformer language model. Despite the fact that the model was seemingly making images from nothing, the results produced weren’t exactly frame worthy.

Last week, OpenAI announced a more advanced version called DALL.E 2. The reaction to the images churned out by the model caused a mini commotion on Twitter. CEO Sam Altman started inviting random commenters to suggest the most inventive ideas they could think of for the model to make images out of. DALL.E 2 more than obliged by creating imagery out of thin air.

         Source: Twitter

       Source: Twitter

Comparison

DALL.E was a 12-billion parameter model that worked using a dataset of text-image pairs. As an input, it received both the image and the text as a single stream of data. Each stream of data contained up to 1280 tokens and was trained using maximum likelihood to generate all of the tokens, one after another. This enabled DALL.E to produce a fresh image from scratch. It could also regenerate any rectangular region of an existing image that extended to the bottom-right corner, such that it was consistent with the text prompts. 

DALL.E 2 essentially does the same thing that DALL.E does, which is to take a complex prompt like, “A painting inspired by Banksy’s art showing a machine-human interaction,” and then turn it into hundreds of images. Eventually, it chooses the most suitable image from all the outputs to one that would meet the user’s standards. However, DALL.E 2 is far more versatile and capable of producing images of a higher resolution. 

More efficient: DALL.E 2 functions on a 3.5-billion parameter model while using another 1.5-billion parameter model to enhance the resolution of its digitally-produced images. The model is also faster at processing images than DALL.E. The big jump in performance is because of a new diffusion model, which is smaller and more efficient than the one that DALL.E used. The diffusion model starts with an image that is entirely noise and then gradually transforms itself to make it look as close to the prompt. 

Source: OpenAI blog

More realistic: The images produced by the newer version are more well-rounded, with complex backgrounds, realistic lighting and reflections. The final product is a far cry from the images that DALL.E produced that were cartoonish and, more often than not, had a plain background.

Source: OpenAI blog

Editing: Another major addition to DALL.E 2 is that it can edit an image using what it calls, “inpainting.” A user can input the prompt asking for the change that it wants to make and select the area on the image that it wants to edit. In a few seconds, the model produces a handful of options for the user to choose from. For example, the user could select an area on a table that carries plates and prompt for them to be removed if they want. The model is also able to render appropriate lighting and shadows in images besides also using the most suited materials for the objects. 

Source: OpenAI blog

Multiple variations: DALL.E 2 is also able to produce multiple variations of a single image. These variations could be an impressionistic version of the image or a close resemblance of it. The user can even give the model a second image and DALL.E 2 can combine the more vital features of both the images to form a final one. 

According to tests conducted by OpenAI, DALL.E 2’s image classification and captions are more accurate. In the past year, it was found that algorithms were more vulnerable to being tricked into mislabelling an item. For example, if the system was trained using the image of an apple that was labelled ‘orange,’ the system would get tricked into believing that it was an orange. However, DALL.E 2 does not make the same mistake. 

Limitations

OpenAI has said it is conscious of the potential negative impact that DALL.E 2 could have in the wrong hands. In today’s world of deep fakes, the model could easily be used to produce misinformation or racist imagery, which is why OpenAI has allowed DALL.2 to be used by developers on solely an invite-only basis. All the prompts that the model receives must adhere to a strict content policy. To completely rule out the possibility of DALL.E 2 producing any hateful or violent images, the dataset itself omitted the inclusion of any dangerous weapons. While OpenAI has said that it intends to turn it into an API eventually, it is prepared to proceed with caution in the case of DALL.E 2. 

Share
Picture of Poulomi Chatterjee

Poulomi Chatterjee

Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.