MITB Banner

Text-to-Image is All the Rage. So Why Aren’t We Talking About Imagen?

OpenAI has also allowed full usage rights to commercialise the images that they create with DALL.E – including the right to reprint, sell and merchandise.

Share

Listen to this story

After DALL.E 2 gained massive popularity not just among the tech community but also artists, students, and other hobbyists, it was quite clear that text-to-image generator is the real deal. The positive response prompted others to develop their own versions of such tools, the best examples being Midjourney and DALL.E Mini (now called Craiyon). Google, one of the leading companies when it comes to AI research, also released its own version of the text-to-image generation tool. It was received with rave reviews when first launched; however, in the current scenario, the popularity of Imagen pales in comparison to the ones discussed above. 

When Imagen proved to be better

Imagen was introduced by Google as a text-to-image diffusion model ‘with an unprecedented degree of photorealism’ and ‘deep level of language understanding.’ For this tool, Google’s team uses a generic language model – like T5 – that is pretrained on text-only corpora. This method helped the team develop a tool that was effective at encoding text for image synthesis. Increasing the size of the language model in Imagen boosts both the sample fidelity and image-text alignment more than increasing the image diffusion model size.

Credit: Google

Imagen demonstrated superior results. It achieved a state-of-art FID score of 7.27 on COCO dataset without being trained on COCO. Google claims that human raters found the Imagen samples to be at par with the COCO data in image-text alignment. Google also announced Drawbench, a benchmark for text-to-image models. With this newly introduced benchmark, the team compared Imagen with recent methods like DALL.E 2, Latent Diffusion Models, and VQ-GAN+CLIP. Imagen outperformed the other models in terms of sample quality and image-text alignment.

Credit: Google

Others tools are accessible

DALL.E 2’s popularity seems to be unsurpassable. This tool from OpenAI definitely had the first mover advantage. Its predecessor – DALL.E – was introduced in 2021 when text-to-image generation was a field relatively untouched. In July this year, OpenAI released DALL.E in Beta. With this, users can buy additional DALL.E credits for USD 15 for 460 images, over and above the monthly free credits. OpenAI has also allowed full usage rights to commercialise the images that they create with DALL.E – including the right to reprint, sell and merchandise.

Another popular text-to-image generation tool that created a lot of major positive buzz is Stable Diffusion. It is available for use via a web interface. A user would just need to log in and start generating images using text prompts. It is similar to DALL.E but has additional options to fine-tune the outcome. The Stable Diffusion can be run locally on the user system or in the cloud; it is expected to be released on GitHub in the coming days. 

Another popular tool Midjourney, which is created by a research company that goes by the same name. The tool can be used on their Discord channel, but the number of free images is limited to 25. Once you surpass this limit, you would be required to pay USD 10/per month for 200 images or get a standard membership of USD 30 per month for unlimited use. Midjourney also allows corporate use of the generated images against a special enterprise membership.

Google is being cautious

In a blog announcing Imagen, Google reserved a portion for talking about the several ethical challenges facing this tool. They note that there are potential risks of misuse that arise in case of open-sourcing the code and demos. “At this time we have decided not to release code or a public demo. In future work we will explore a framework for responsible externalisation that balances the value of external auditing with the risks of unrestricted open-access,” the blog noted. 

The team also confessed that the large data requirements of the model had them rely on web-scraped datasets, which were mostly uncurated. This approach helps in algorithmic advances but also perpetuates social stereotypes and harmful associations, especially to marginalised groups. For Imagen, Google utilised LAION-400M dataset. It is an open and freely accessible dataset that contains large portions of uncurated data. In fact, the official website of this dataset notes that the dataset was developed for research purposes and is ‘not meant for real-world production or application.’

As per Google, for the reasons mentioned above, Imagen carries a risk of furthering harmful stereotypes and representations, which makes it unfavourable for a public release till strong safeguards are in place.

Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.