OpenAI Wins Again 

Puts Gemini and Midjourney in a tough spot
Listen to this story

Google has been speaking of Gemini for a while now and people are growing increasingly impatient with its all talk, no show. Meanwhile, OpenAI sensed the lull and grabbed the opportunity by announcing its plans to integrate DALL-E 3 with ChatGPT Plus and ChatGPT Enterprise. 

This surely is a game changing move by OpenAI as it props up GPT-4 as the first functional multimodal model out in the market, which creates text and image both, similar to what Gemini promises. 

To make up for the absence of Gemini, Google recently added extensions to Bard along with the ability to upload images with Lens and get Search images in responses. It  was Google’s attempt to make Bard multimodal. However, only time will tell if it will be able to withstand the incoming competition from DALL-E-integrated ChatGPT Plus, scheduled to be launched in October. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

That said, OpenAI has the potential to impact not only Google Bard and Gemini but also put pressure on other text-to-image generation models like Midjourney and Stable Diffusion as DALL-E 3 has shown promise by creating high-quality images. 

Users Advantage 

Integrating DALL-E 3 with ChatGPT Plus gives OpenAI an edge as compared to other image generation tools as it has the largest user base compared to all other models out there in any segment. 

At the moment, ChatGPT is one of the world’s most-popular websites, which attracted a staggering 1.4 billion visits globally in August. Meanwhile, during the same month, Bard received 183.5 million visits. Midjourney, on the other hand, has over 15 million active users and saw 21 million visits in August. Stable Diffusion has more than 10 million daily active users across all channels, according to Stability AI chief Emad Mostaque. 

From users’ perspective, DALL-E 3 on ChatGPT gives them the freedom to generate text as well as image on a single platform. And naturally, if the users are getting easy results from one popular platform, they would prefer it over the others, any day. 

If we look at the numbers, ChatGPT boasts of a huge user base who won’t shy away from using newer versions of ChatGPT Plus at a price of $20 or a little more. Midjourney, however, has a huge price difference and sells monthly plans ranging from $10 to $120. It can be said that OpenAI is paving the way for a unified multimodal model capable of handling a wide range of tasks. Additionally, there have been user complaints regarding the user interface of Midjourney, which is presently hosted on Discord.

Multimodal Market is Scattered

If we examine the currently available multimodal models, we find that they are quite scattered, for there isn’t a single model that can perform all tasks. Alongside closed-source models, there are also various open-source models claiming to be multimodal. It is however still not clear which model deserves to claim that it is truly multimodal.

For example, Hugging Face recently introduced a multimodal model named IDEFICS. It has the ability to process both text and image inputs and generate descriptions for the images. Similarly, Bard possesses the capability to accept image inputs. Also, Meta recently launched SeamlessM4T, a foundational speech/text translation and transcription model with an all-in-one system that performs multiple tasks such as speech-to-speech, speech-to-text, text-to-text translation, and speech recognition. OpenAI and Google have also developed their own speech-to-text models, namely Whisper and AudioPaLM-2, respectively. 

If OpenAI adds text-to-speech and speech-to-text features as well  to ChatGPT Plus, it could race ahead of other models, making it challenging for others to catch up. Meanwhile, OpenAI doesn’t seem to have any plans to stop here. According to recent reports, it is also planning to integrate GPT-Vision into GPT-4, indicating that it is here to stay. 

Siddharth Jindal
Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry


Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox