Listen to this story
The cat’s out of the bag! After secretly working on an image generation tool for months, OpenAI finally announced DALL.E 3. Not only that, come October and DALL.E 3 will be integrated with ChatGPT Plus and ChatGPT Enterprise — finally fulfilling the multimodal promise for GPT-4. With the text and image generation mega combo pack available on the famous chatbot, what will the fate of other image-generation tools, such as Midjourney, be?
Should Midjourney Be Worried?
When DALL.E 3 was being tested within Discord users a few months ago, the output generated by it was considered far superior to Midjourney. A user even mentioned that they have ‘zero interest in using Midjourney after using it’. The duo has been tested through a side-by-side comparison to gauge the extent to which that sentiment holds true. Creative director and community developer in AI and art, Nick St Pierre made a comparison with images generated from both DALL.E-3 and Midjourney by giving them the same set of prompts.
Prompt: Close-up photograph of a hermit crab nestled in wet sand, with sea-foam nearby and the details of its shell and texture of the sand accentuated.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
DALL.E 3 (top) and Midjourney (bottom)
Prompt: A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon.
DALL.E 3 (left) and Midjourney (right)
It is noticeable that in the images generated by DALL.E 3, minute intricacies related to input prompts are followed. They are more closely aligned to the specific details mentioned in the prompt instructions.
Simpler Language Prompts
The USP of DALL.E 3 is the simplicity in its usage of text prompts. With ChatGPT integration, users can easily input simple, conversational prompts via simple sentences or detailed paragraphs that will output relevant images. Users can continue to have conversations with the chatbot to further tweak the generated output.
A user has spoken about the superiority of DALL.E 3 in terms of image quality, prompt coherence, and an accessible UI. Meanwhile, Midjourney is working on its latest version (V6), which is said to have a better understanding of natural language understanding. They are even looking to bring it on the web and mobile platforms.
Midjourney, a pure generative AI platform for creating images, has all features built for image editing. The output can be tweaked to an extent to provide better colour, contrast or composition. With a list of features such as Zoom, Pan, Remix, etc, Midjourney V5.2 (the last released version) is synonymous with any image generation/editing tool and can be compared to the likes of Adobe too. However, DALL.E 3 does not have any of these features.
Multimodality In Pieces
Though multimodality has been addressed with DALL.E 3 integration on ChatGPT, images can be produced only as an output. However, in Midjourney, you can upload images as reference with text prompts to get a desired image. This is not available on DALL.E 3, and input is still in the form of text only.
Midjourney can be accessed only through Discord, something that is being highly criticised, especially after DALL.E-3’s release. The onboarding process and usage of the application on Discord seems complicated to many, which dissuades people from using it.
During the initial testing phase with Discord users, OpenAI’s tool had no control on its safety feature. Gore, inappropriate images along with trademarked brand logos were generated. This has however been updated before the release of DALL.E 3. OpenAI announced in its latest blog about how safety has been prioritised in the image-generation feature by collaborating with Red Teamers and removing harmful biases related to visual representation. The model declines public figure requests, and even requests for images in the style of a living artist.
This is an area where Midjourney has tricky boundaries. In the past, a number of images of public figures such as Pope Francis, Donald Trump, Elon Musk and many more have been created using Midjourney. It led to wide criticism and copyright infringement lawsuits too. However, this issue is not yet addressed in spite of releasing five versions of the application. Furthermore, OpenAI is researching and building an internal tool, a provenance classifier, to help identify if an image is generated by DALL.E 3.
Starting as an avocado chair to becoming an avocado patient, OpenAI’s fascination with the fruit to show how far DALL.E has come, is quite impressive. Integrating it on a chatbot that has over 100M users is the best way for higher reach too. However, Midjourney with its impressive image quality with every version release, and with another version in the pipeline that will probably have a web and mobile presence, will certainly be a game-changer.