Listen to this story
A trend in the AI world that marked at least the first half of the year has to be text-to-image generation tools. Not just the tech world but everyone with a curious bone in their body rushed to check out these tools. While OpenAI’s DALL.E started it, soon the market was filled with similar tools – even giants like Google and Meta jumped in to introduce their own versions.
The Technical Titbits
Sign up for your weekly dose of what's up in emerging technology.
When OpenAI launched DALL·E 2 in April 2022, they changed how the world perceives AI art. It is a generative language model that can create stunning images from natural language instructions or contextual clues.
DALL·E 2 is a large model with 3.5B parameters, but not nearly as large as GPT-3 and, interestingly, smaller than its predecessor, DALL·E (12B). Despite its size, DALL·E 2 generates 4x higher resolution images than DALL·E, and it is preferred by human judges in caption matching and photorealism over 70 percent of the time. CLIP (for Contrastive Language-Image Pre-training) is one of the most important building blocks in the DALL·E 2 architecture, as it is the primary link between text and images.
OpenAI founder Sam Altman recently tweeted about making DALL·E 2 available to 1 million users. As part of this initiative, each user will receive 50 free credits during the first month of use and 15 free credits each month thereafter. Users can also buy credits on top of the free monthly credits for USD 15 to get 115 credit increments in the first beta phase. Each credit can be used to generate one original DALL·E 2 prompt or an edited or variation prompt. DALL·E 2 produces four images for each natural language prompt and three images for each edit and variation prompt.
On the other hand, Midjourney is from an independent research lab with the same name whose overarching mission is to “explore new mediums of thought.” They launched a text-to-image service in 2022, which, given a natural language prompt, generates visual depictions that are accurate to the description.
Midjourney is an invite-only on-boarding system that sends and receives calls to AI servers via Discord. When a natural language query is issued, the bot returns four low-resolution images in roughly 30 seconds. At this point, you can generate variants and new generations to get closer to your desired ideation. You can change the aspect ratio of your text prompt with a maximum resolution of 2048×1280, while DALL·E 2 is stuck at 1024×1024 resolution.
Once you’ve dug down and found your preferred variant, you can upscale it and pull it down to your local machine. Midjourney, unlike DALL·E 2, combines CLIP with a constantly changing set of image generation methods.
Given that both these tools are “work-in-progress,” picking a winner might be difficult. DALL·E 2 is good at close-up photographs and discrete objects. It recognises a wide range of pop culture references, especially those in visual media or literary works with film adaptations. DALL·E 2 can create the most impressively high-quality charcoal or pencil sketches, paintings in the styles of various famous artists, and strange things like “medieval illuminated manuscripts.”
It works especially well with art styles like “impressionist watercolour painting” or “pencil sketch,” which are more forgiving of flaws in the details. DALL·E 2 can create some absolutely stunning artwork with the right prompts and cherry-picking.
Midjourney can do all of the above and more. It’s exceptional at creating larger scenes. However, cracking the right prompt is perhaps the toughest part.
In the end, it depends on what the user wants to do. If you require a more detailed, higher resolution image and are willing to spend a few dollars, Midjourney is definitely the way to go.