MITB Banner

OpenAI Wins Again 

Puts Gemini and Midjourney in a tough spot

Share

Listen to this story

Google has been speaking of Gemini for a while now and people are growing increasingly impatient with its all talk, no show. Meanwhile, OpenAI sensed the lull and grabbed the opportunity by announcing its plans to integrate DALL-E 3 with ChatGPT Plus and ChatGPT Enterprise. 

This surely is a game changing move by OpenAI as it props up GPT-4 as the first functional multimodal model out in the market, which creates text and image both, similar to what Gemini promises. 

To make up for the absence of Gemini, Google recently added extensions to Bard along with the ability to upload images with Lens and get Search images in responses. It  was Google’s attempt to make Bard multimodal. However, only time will tell if it will be able to withstand the incoming competition from DALL-E-integrated ChatGPT Plus, scheduled to be launched in October. 

That said, OpenAI has the potential to impact not only Google Bard and Gemini but also put pressure on other text-to-image generation models like Midjourney and Stable Diffusion as DALL-E 3 has shown promise by creating high-quality images. 

Users Advantage 

Integrating DALL-E 3 with ChatGPT Plus gives OpenAI an edge as compared to other image generation tools as it has the largest user base compared to all other models out there in any segment. 

At the moment, ChatGPT is one of the world’s most-popular websites, which attracted a staggering 1.4 billion visits globally in August. Meanwhile, during the same month, Bard received 183.5 million visits. Midjourney, on the other hand, has over 15 million active users and saw 21 million visits in August. Stable Diffusion has more than 10 million daily active users across all channels, according to Stability AI chief Emad Mostaque. 

From users’ perspective, DALL-E 3 on ChatGPT gives them the freedom to generate text as well as image on a single platform. And naturally, if the users are getting easy results from one popular platform, they would prefer it over the others, any day. 

If we look at the numbers, ChatGPT boasts of a huge user base who won’t shy away from using newer versions of ChatGPT Plus at a price of $20 or a little more. Midjourney, however, has a huge price difference and sells monthly plans ranging from $10 to $120. It can be said that OpenAI is paving the way for a unified multimodal model capable of handling a wide range of tasks. Additionally, there have been user complaints regarding the user interface of Midjourney, which is presently hosted on Discord.

Multimodal Market is Scattered

If we examine the currently available multimodal models, we find that they are quite scattered, for there isn’t a single model that can perform all tasks. Alongside closed-source models, there are also various open-source models claiming to be multimodal. It is however still not clear which model deserves to claim that it is truly multimodal.

For example, Hugging Face recently introduced a multimodal model named IDEFICS. It has the ability to process both text and image inputs and generate descriptions for the images. Similarly, Bard possesses the capability to accept image inputs. Also, Meta recently launched SeamlessM4T, a foundational speech/text translation and transcription model with an all-in-one system that performs multiple tasks such as speech-to-speech, speech-to-text, text-to-text translation, and speech recognition. OpenAI and Google have also developed their own speech-to-text models, namely Whisper and AudioPaLM-2, respectively. 

If OpenAI adds text-to-speech and speech-to-text features as well  to ChatGPT Plus, it could race ahead of other models, making it challenging for others to catch up. Meanwhile, OpenAI doesn’t seem to have any plans to stop here. According to recent reports, it is also planning to integrate GPT-Vision into GPT-4, indicating that it is here to stay. 

Share
Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India