How did text-to-image tools become so commercialised

Interestingly, Google, like OpenAI, recently said that the company wouldn’t release its image generation tool Imagen to the public due to risks of misuse.
Listen to this story

Seven years ago, in 2015, AI innovation was marked by an important development – automated image captioning. ML algorithms could be used to label objects in image datasets which could further be turned into natural language descriptions using automated image captioning. This feature is usually directed toward persons with vision problems.

This research inspired a certain curiosity in the research community. A group of scientists from the University of Toronto went a step ahead and decided to flip the process to answer the question: what if these natural language descriptions could be used to generate images instead? 

The task was far more complex than producing text from image datasets. The model was trained on a large-scale dataset called Microsoft COCO and could also generalise beyond the training set to produce entirely novel images. The images were based on captions that were highly unlikely to occur in real-life situations and looked something like this. 

Source: Research Paper

The images then may not have been high quality, but the breakthrough itself led the way to a promising future. With the release of OpenAI’s DALL.E and successor DALL.E 2 this year, the future is finally here. 

In April this year, OpenAI chief Sam Altman announced the launch of DALL.E 2 and invited followers to give the most random, surreal prompts they could imagine. Altman posted the photogenic results that faithfully represented the instructions to oohs and aahs on Twitter. 

A revolution in AI image generation 

DALL.E 2 was the starting point for what has now become a revolution in text-to-image generation within AI. In a report by Wired, a PhD candidate at Penn State, Vipul Gupta, who received early access to the tool, noted, “What people thought might take five to 10 years, we’re already in it. We are in the future.” 

Initially, OpenAI mentioned in their blog that DALL.E 2 wasn’t yet ready for commercial use but could be used eventually in fields like art, marketing and education. The company reasoned that DALL.E 2 could admittedly churn out images that were sexist, racist and could be hateful by nature. The company formed a ‘red team’ comprising external experts who started looking closely at the tool’s biases. DALL.E 2 was opened up only to 400 people who were mainly OpenAI or Microsoft employees. 

At this, a big chunk of Twitter users expressed their disappointment regarding the decision. Developers and designers were eager to get their hands on it. Some complained that OpenAI’s exclusivity created a sense of ‘eliteness’ in AI, and many others were simply impatient. The company’s justification didn’t look good enough.



Competitive environment

It soon became obvious that the world couldn’t wait long enough. On June 6, Hugging Face noticed the usage of its AI image generation tool, DALL.E Mini, had shot up to around 50,000 images generated in a day. The app was developed by Boris Dayma, an independent ML consultant who replicated DALL.E at a hackathon organised by Hugging Face and Google in July last year. Dayma said that he became deeply interested in the tool after studying the DALL.E research paper

The images that DALL.E Mini generated were of a much lower quality than OpenAI’s original tool, but it was open source. Turns out it was enough to get people hooked already. Regular people, including non-developers, started using DALL.E Mini to exercise their imagination. Where DALL.E 2 was essentially doing the work of an artist, the availability of DALL.E Mini had turned what was conceptually a similar tool into a meme generator. Everyone could now have a piece of the future. People began posting these images and ‘memes’ they had created using DALL.E Mini Twitter and Reddit. The image quality improved. Ironically, DALL.E Mini became so popular that Dayma was recently requested to change the tool’s name (it is now called Craiyon). 

In a span of a few months, text-to-image generation tools are now dime-a-dozen. A few tools like Midjourney produce high-quality images, others not so much. But most are free to all. This, despite the fact that these tools produced images with similar biases like DALL.E 2. Interestingly, Google, like OpenAI, recently said that the company wouldn’t release its image generation tool Imagen to the public due to risks of misuse. The even more recent, Make-a-Scene, the creative art-focused image generator released by Meta, also noted that it would be open exclusively to specific AI artists. 

Fear of criticism from misuse

The difference is clear – prominent tech companies, including the Microsoft-backed OpenAI, were cautious enough to avoid criticism that could arise from the dangers around the usage of these tools. DALL.E 2 images were good enough to be used to attach to, say, a fake news report. It wasn’t to say that these same issues couldn’t be due to other copycat tools, but the less prominent companies did not have the weight of their reputation to carry.

However, the sudden competition among image generators appears to have forced OpenAI to move faster toward opening up DALL.E 2, lest it should lose its position as the industry leader. The company announced today that it would be expanding access to the tool in a blog through a beta release. OpenAI aims to fasten the waitlist process and add up to a million users within the next few weeks. The tool, which had been free up until now, will have a credit-based fee. DALL.E 2 will now also cater to artists who might not be able to afford it by providing subsidies. 

“Expanding access is an important part of our deploying AI systems responsibly because it allows us to learn more about real-world use and continue to iterate on our safety systems,” OpenAI explained in the blog. Meanwhile, it has continued to work on the tool’s biases and introduced a technique that would make the images more inclusive in terms of race and gender. 

For better or worse, AI-generated art has become more or less democratised. One could argue that art (even if it isn’t as good) should be accessible to all, just like AI. But how great a decision this is, only time will tell.

Download our Mobile App

Poulomi Chatterjee
Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring