Last updated September 22, 2023
In AI Origins & Evolution

OpenAI Wins Again

Puts Gemini and Midjourney in a tough spot

Share

Published on September 22, 2023

by Siddharth Jindal

Listen to this story

Google has been speaking of Gemini for a while now and people are growing increasingly impatient with its all talk, no show. Meanwhile, OpenAI sensed the lull and grabbed the opportunity by announcing its plans to integrate DALL-E 3 with ChatGPT Plus and ChatGPT Enterprise.

This surely is a game changing move by OpenAI as it props up GPT-4 as the first functional multimodal model out in the market, which creates text and image both, similar to what Gemini promises.

I think DALL·E 3 is not just a stance against MidJourney. It's actually a sneak peak of the upcoming, epic battle of massively multimodal LLMs, against DeepMind Gemini.

Quote: "DALL·E 3 is built natively on ChatGPT". This is the key phrase.

DALL·E 3's extraordinary language… pic.twitter.com/08rpGTgD5h
— Jim Fan (@DrJimFan) September 20, 2023

To make up for the absence of Gemini, Google recently added extensions to Bard along with the ability to upload images with Lens and get Search images in responses. It was Google’s attempt to make Bard multimodal. However, only time will tell if it will be able to withstand the incoming competition from DALL-E-integrated ChatGPT Plus, scheduled to be launched in October.

That said, OpenAI has the potential to impact not only Google Bard and Gemini but also put pressure on other text-to-image generation models like Midjourney and Stable Diffusion as DALL-E 3 has shown promise by creating high-quality images.

Users Advantage

Integrating DALL-E 3 with ChatGPT Plus gives OpenAI an edge as compared to other image generation tools as it has the largest user base compared to all other models out there in any segment.

At the moment, ChatGPT is one of the world’s most-popular websites, which attracted a staggering 1.4 billion visits globally in August. Meanwhile, during the same month, Bard received 183.5 million visits. Midjourney, on the other hand, has over 15 million active users and saw 21 million visits in August. Stable Diffusion has more than 10 million daily active users across all channels, according to Stability AI chief Emad Mostaque.

My personal opinion is that if Dall E will be integreted into Chatgpt

It will be much more used in the wider aspect of the world.

Simply because Chatgpt is huge.

However i love midjourney and its so fun and good to use.

So i guess we will see
— Junior (@juniorforesight) September 21, 2023

From users’ perspective, DALL-E 3 on ChatGPT gives them the freedom to generate text as well as image on a single platform. And naturally, if the users are getting easy results from one popular platform, they would prefer it over the others, any day.

If we look at the numbers, ChatGPT boasts of a huge user base who won’t shy away from using newer versions of ChatGPT Plus at a price of $20 or a little more. Midjourney, however, has a huge price difference and sells monthly plans ranging from $10 to $120. It can be said that OpenAI is paving the way for a unified multimodal model capable of handling a wide range of tasks. Additionally, there have been user complaints regarding the user interface of Midjourney, which is presently hosted on Discord.

Midjourney needs to get out of Discord ASAP
— Nick St. Pierre (@nickfloats) September 20, 2023

Multimodal Market is Scattered

If we examine the currently available multimodal models, we find that they are quite scattered, for there isn’t a single model that can perform all tasks. Alongside closed-source models, there are also various open-source models claiming to be multimodal. It is however still not clear which model deserves to claim that it is truly multimodal.

For example, Hugging Face recently introduced a multimodal model named IDEFICS. It has the ability to process both text and image inputs and generate descriptions for the images. Similarly, Bard possesses the capability to accept image inputs. Also, Meta recently launched SeamlessM4T, a foundational speech/text translation and transcription model with an all-in-one system that performs multiple tasks such as speech-to-speech, speech-to-text, text-to-text translation, and speech recognition. OpenAI and Google have also developed their own speech-to-text models, namely Whisper and AudioPaLM-2, respectively.

If OpenAI adds text-to-speech and speech-to-text features as well to ChatGPT Plus, it could race ahead of other models, making it challenging for others to catch up. Meanwhile, OpenAI doesn’t seem to have any plans to stop here. According to recent reports, it is also planning to integrate GPT-Vision into GPT-4, indicating that it is here to stay.

Access all our open Survey & Awards Nomination forms in one place

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.