Listen to this story
|
In a battle of who can create realistic AI images, one major player that captured the public attention through their text-to-image model seems to be mum since last year. Though OpenAI has been keeping itself busy with ChatGPT, other players such as Midjourney and Stable Diffusion have overtaken their image generation platform Dall E-2. However, as per latest developments, it looks like Dall E-3 is round the corner – an attempt to catch up on the AI image generation race.
Playing Catchup in Image Race
It is believed that OpenAI is testing a new image generating platform which could be an upgrade of Dall E-2. Through an invite-only preview, an exclusive OpenAI testing server housing 400 people has access to the latest version of the model. Through an explainer video, Youtuber MattVidPro shared the images of the new model that is being tested. The verdict as per users – “I have zero interest in using Midjourney after using this.”
The new model is said to be highly capable and superior in following prompts and coherent details, including coherent text, photorealism and different art styles. The model has been able to create images with detailed features such as hair, lighting, ad copies – and the common problem of hand detailing is also sorted in this model. It has also been compared to other applications such as Midjourney V5.2, and Stability Diffusion XL, where it appears to outperform all of them.
Screenshots of AI generated images from the latest model. Source: Youtube
Not Forgotten, Quietly Fighting
After doing away with user waitlists, Dall E-2 was released to all in September 2022. Since then there have been no major updates to the model. In March this year, it was reported that the company was experimenting with Dall E-2 and solicited feedback from a small group of users for early feedback. The model was experimented with to create sharper and more photorealistic images.
Comparing the existing model of Dall E-2 with the latest version of Midjourney, the images delivered by the former are closer to the prompt provided.
Prompt : Painting of a pink Jester giving a high five to a panda while in a cycling competition. The bikes are made of cheese and the ground is very muddy. They are driving in a foggy forest and the panda is angry.
Dall E-2 (July)
With GPT-4 having multimodal functions, it is possible that OpenAI’s next version of text-to-image generation model will have enhanced capabilities.
User comments for the new model. Source: Youtube
Midjourney, which has released 5 versions of their text-to-image generation models in a span of 1 year, has stuck to closed-source models all along. On the other hand, Stability Diffusion is open source and their latest model Stability Diffusion XL 1.0 is also available on Amazon Bedrock. Whereas, Adobe Firefly which takes on Midjourney and Dall E with their generative AI capabilities, offers their service as a trial first and then an option to subscribe.
Safety First?
OpenAI recently committed to a set of action points to ensure responsible AI governance. Under the coordination of the US government, OpenAI along with six other big tech companies including Microsoft, Google, Meta, would work towards watermarking AI generated audio and visual content. It is possible that this watermarking will be embedded into the latest version that they are testing out.
If so, OpenAI would become the first major tech company to tag AI-generated images. While safety seems to be their priority, OpenAI’s latest image generation model, at the moment, seems far from safe.
Probably owing to the testing phase, safety features are not present on the current model and images containing blood, gore, and frontal nudity can be generated. Graphic pictures depicting extreme violence can pop up without prompting for the same. Furthermore, it is able to generate copyrighted artworks, characters, and accurate company logos.
Last year, Dall E-2 had come under scrutiny for creating inappropriate images. It was reported to have created images that fortify gender biases, reinforced racial stereotypes and overly sexual images.
While the new model will require fine tuning and nuances to bring in safety features, the community responses for the model have been highly promising. Rating it higher than current image-generation tools. It is estimated that the new model will arrive in December.