During Alphabet’s Q3 earnings call, CEO Sundar Pichai announced a slew of upcoming releases from the company. “We are just really laying the foundation of what I think of as the next-generation series of models we’ll be launching throughout 2024,” he said.
However, the conspicuous lack of the mention of Gemini’s imminent launch in Pichai’s address has left uncertainties hanging over its release timeline. Given Google’s expectations from the product and the financial implications, a release this year would’ve warranted a mention—unlike Duet AI and the new Pixel phone, which featured in the call.
Gemini was expected to come out this fall, yet, October has fizzled out and there are no signs of the product, which Google DeepMind CEO Demis Hassabis claimed, “Would be more capable than OpenAI’s GPT-4.”
This may very well be because Google doesn’t wish to replicate the hurried release of Bard—which fell short of expectations. Google is likely taking extra precautions to ensure that Gemini meets the high expectations set for it.
What To Expect From Gemini
Sissie Hsiao, Google’s VP and general manager of Bard and Google Assistant, also spoke highly of Gemini, giving a glimpse into the potential it has, saying, “I’ve seen some pretty amazing things.” “Like, I’m trying to bake a cake, draw me 3 pictures of the steps on how to ice a three-layer cake, and Gemini will actually create those images,” she said.
“And these are completely novel pictures. These are not pictures from the internet,” she added. “It’s able to speak in imagery with humans now, not just text.”
By Hassabis’ own admission, engineers at DeepMind are using techniques from AlphaGo for Gemini.
“At a high level you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models,” Hassabis said. “We also have some new innovations that are going to be pretty interesting,” he added.
On the other hand, according to a leak posted on a blog, Gemini might be way different from Google’s previous offerings. The leak suggests that Gemini is poised to replace PaLM 2 in Makersuite, confirming multimodal capabilities.
The leak also reveals a feature called “Stubbs” — a tool that allows users to create functional app prototypes with just a single prompt. It also offers the option to include an image of the app you want to create or clone. Users can then deploy and share these prototypes seamlessly. A Stubbs Gallery is included, providing an easy way to view and remix other Stubbs, as well as publish your own creations.
Makersuite is set to get an autosave feature, addressing a common issue of losing work when prompts aren’t saved due to power outages or other disruptions.
Makersuite is a standout feature which offers a user-friendly interface for multimodal prompt creation. It means that it can accept both images and, potentially, audio inputs. It can then generate multimodal content, including HTML.
Additionally, Makersuite will see improved translation support, removing previous restrictions on the ratio of English to non-English text in prompts.
However, it’s important to note some limitations. While Text and Data prompts will support multimodality, Chat prompts will not. Importantly, Stubbs will not create full app code but rather deploy prototypes, similar to Figma prototypes fully generated by AI. Furthermore, image input in the Makersuite UI will not support GIFs.
Neck and Neck with OpenAI
The expectation is that Google will make a significant impact with the launch of Gemini. While OpenAI curated an impressive dataset, Google’s vast digital platform, including visual data from YouTube, gives them a unique advantage. It’s likely that Google is waiting to launch Gemini when it can deliver a major blow to competitors like Microsoft, aiming for a knockout punch in the market.
But would it be too little too late?
OpenAI has also set high standards for safety and performance, which Google is keen to meet. They want Gemini to outperform GPT-4 and provide compelling reasons for users to switch. Google is likely considering marketing strategies to position Gemini as a unique and valuable product, possibly targeting the business-to-business (B2B) market or integrating it with existing services.
Additionally, Google is cautious about pricing Gemini competitively, especially considering Microsoft’s offer of GPT-4 with Bing for free. However, the delay in adopting new AI technologies may contribute to a lag in their intended audience’s discovery and utilisation.
But Shipment Delayed
While Google wants to bring a perfect product to the market, the delay, however, seems to have increased the pressure on them, as timely delivery is crucial to avoid disappointment among users. Additionally, it seems like the wait is going to get prolonged.
Pichai during the earnings call remarked, “We are definitely investing, and the early results are very promising,” while discussing the progress of their projects. This description of “early results” for a model that has been in development for most of the year raises curiosity about the current status and performance of the Gemini project.
The decision to project Gemini as a work in progress could also stem from the fact that OpenAI announced additional updates, truly making GPT-4 multimodal. Multimodality, a feature which Google promised through its freely accessible Bard, but has not been delivered in many parts of the world seems to be their main focus this time around.
“We are creating it from the ground up to be multimodal, highly efficient tool and API integrations and, more importantly, laying the platform to enable future innovations as well,” Pichai added.
Conclusively, the leaked features offer a promising outlook for content creators and developers, which might be subject to change given the indication that Gemini’s release date might be pushed further ahead. However, given the pace of innovation at OpenAI with its GPT-4 and a possible chatter around the model’s next rendition being around the corner, is Google’s strive for perfection or standing toe-to-toe with OpenAI proving to be to its detriment?