DALL-E 2, the latest iteration of OpenAI’s AI system that creates images from textual prompts in natural language, has broken the internet, and for the right reasons. To showcase the system’s potential, OpenAI CEO Sam Altman solicited ideas from the public: Imaginative prompts such as ‘a shark and a dolphin cruise hand-in-hand with an undersea city in the background’ or ‘a rabbit detective sitting on a park bench and reading a newspaper in a victorian setting’ came up as suggestions. The results have made people wonder if DALL.E, or similar AI tools, could replace human designers.
DALL·E 2 learns the relationship between images and the text prompts. The process starts with a pattern of random dots and builds towards the final image using a process called diffusion. The new iteration also comes with an editing option. DALL.E 2 can respond to natural language caption to add and remove elements while taking shadows, reflections, and textures into account. DALL-E 2 generates more realistic and accurate images with 4x greater resolution than DALL·E.
How credible is the threat
Alex Nichol, one of the DALL-E researchers, said tools like DALL.E 2 democratise the designing process. But does it put designer jobs at risk? What exactly is the collateral damage of this technology? “Questions which human beings are asking are, is my job viable? Is my profession at risk? And unfortunately, that’s just a different set of questions that the businesses are asking- can we do something faster, cheaper, more automated?” said George Baily, product marketing manager at FintechOS.
DALL-E generated images can be a viable substitute for expensive stock photos. The major factors at play here are speed and scale. “As an individual wanting a series of interesting illustrations in a particular style or having ten ideas for a new logo or a quick banner ad for my social media campaign, I don’t want to have an emotional discussion with my designer about their creativity. It doesn’t need to be perfect, it just needs to be. These images can be created at a hyper-speed, scale and quantity that humans are just not built to process,” he said.
Gary Marcus, scientist and the author of Rebooting AI, said: “DALL-E is probably best used as a source of inspiration rather than a tool for final products. You can say something like “a boat on the sea, in a Van Gogh style”, and get something beautiful. But if you want to change the end product, perhaps to “a boat on the sea but with five people rather than 4, with the tallest person in the front and the shorter person in the back, with the same boat, but painted brown, and a slightly darker background”, the system probably won’t understand the language well enough to meet your exact specifications.”
The trouble with DALL.E 2
Gary Marcus has also taken to Twitter to highlight the model’s drawbacks, such as the difficulty to fine-tune output to specific needs; superficial understanding of language; and compositional flaws.
Mathematician Jeremy Kahn said the system often fails to render details like lighting or shadow in complex scenes: DALL.E is also not good at merging borders or understanding binding attributes. Alex Nichol discussed a few examples: He asked the tool to ‘put the Eiffel Tower on the moon’, and it put the moon in the sky above the tower. Then, he gave a ‘living room filled with sand’ prompt, and it generated a scene that resembled a construction site than a living room.
CNN’s Rachel Metz also posted a few DALL.E 2 fails on Twitter.
The AI system also throws the idea of art into sharp relief. “What makes art valuable is that artists have an opportunity cost of doing that art. It’s a sacrifice of something else in their life. AI art gets inflated instantly. Everyone can get a beautiful painting with a button click. Not valuable,” said a commenter in a hacker news thread.
According to OpenAI, DALL-E shows how imaginative humans and clever systems can work together to make new things, amplifying our creative potential. We can safely conclude that DALL.E is no Picasso, but the AI system can create spellbinding conceptual art with the right input and combinatorial play.