Listen to this story
Midjourney, DALL.E-2 or Stable Diffusion, which is the best text-to-image generator? DALL.E 2, which is the second-generation model of DALL.E, is a smaller version of its predecessor, but is arguably the better one. While DALL.E 2 can create pretty much anything, it uses a method called unCLIP, which is sophisticated enough to create images which were once difficult for us humans to even express. It still has its limitations.
(credit: AI Network)
The model is not open to the public, and while OpenAI might have its own reasons for not doing so, the market is now seeing a rise in open-source models of text-to-image generators (like Stable Diffusion) just like in the case of GPT-3 when GPT-Neo was launched by advocates of open resources.
However, this is also possible due to OpenAI open-sourcing CLIP, which is indirectly related to DALL E. It can also be said that CLIP is the basis of DALL.E 2, and it’s one of the fundamental reasons why platforms such as Midjourney and Stable Diffusion exist today.
Since DALL.E 2 is trained on millions of stock images, the output it creates is much more sophisticated and is best suited for corporate use. According to Emad Mostaque (creator of Stable Diffusion), inpainting is the best feature of DALL.E 2, which makes it stand apart from other image generators. Also, DALL.E 2 produces much better images when it has more than 2 characters, as compared to Midjourney or Stable Diffusion.
Midjourney, on other hand, is a tool best known for its artistic style. The image it generates almost never looks like a photo, but painting. Some artists think of it as an art student. “I feel Midjourney is an art student who has its own style. And when you invoke my name to create an image, it’s like asking an art student to make something inspired by my art,” said an artist.
Midjourney uses a discord bot to send and receive calls to AI servers, and pretty much everything happens on discord. Midjourney also has an active community of around 1 million+ people, where you can see everyone create magic with art.
Midjourney founder David Holz says he doesn’t want the images to look like photos. He believes he might make realistic versions at some point, but the company doesn’t want it to be a default. “Perfect photos make me a little uncomfortable right now, though I do see legitimate reasons why you might want something more realistic.”
While DALL.E 2 and Midjourney both are refraining from going fully open-source, Stable Diffusion claims to be an open-source model to which everyone will have access. Mostaque claims, “Code is already available as is the dataset. So everyone will improve and build on it.”
Stable Diffusion also has quite a nice understanding of modern artistic illustration and can produce very detailed artworks. However, it lacks the interpretation of complex original prompts. Stable Diffusion is unable to produce those prompts which even a small image generator like Cryon (previously DALL.E mini) can produce. Stable Diffusion is great at complex artistic illustrations, but fails when it comes to generating general images like logos.
Another thing that some point out is that since Stable Diffusion is unrestricted in nature, unlike Midjourney or DALL.E2, it has been used to generate nude images of models, military conflicts and images of political or religious figures in incongruent situations.
(image of Barack Obama created by Stable Diffusion, credit: stability)
(Boris Johnson wielding various weapons, generated by Stable Diffusion. Image Credits: Stability AI)
Stable Diffusion, however, would be a milestone in the text-to-image generation market. Since it is open source, the developers in future can generate more sophisticated tools due to the available codes on GitHub. As to which among them is the best, Midjourney’s artistic ability, DALL E2’s realistic images and Stable Diffusion’s unrestricted use make all of the AI models better in one way or another. In the end, it depends upon the users’ requirements.