Now Microsoft wants a share of the ‘AI image generator’ pie

Compared to DALL-E, Imagen and Midjourney, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation, says Microsoft
Listen to this story

Text-to-image generative models like OpenAI’s DALL-E 2 are attracting significant attention because of their ability to produce images merely based on text prompts. While DALL-E 2 is the most popular, there are other budding AI image generators such as Ultraleap’s ‘Midjourney’, Hugging Face’s ‘Craiyon’, Meta’s ‘Make-A-Scene’ and Google’s ‘Imagen’.

Now, it seems that Microsoft also wants a share of the ‘AI image generator’ pie. Recently, Microsoft’s Asia research team introduced NUWA-Infinity, which is a multimodal generative model designed to generate high-quality images and videos from any given text, image or video input.


In its research paper titled, ‘NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis’, Microsoft said that they evaluated NUWA-Infinity on five high-resolution visual synthesis tasks— 

  • Unconditional Image Generation 
  • Text-to-Image 
  • Text-to-Video 
  • Image Animation 
  • Image Outpainting

Compared to its predecessor ‘NUWA’, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation.

Since NUWA-Infinity is focused on generating high resolutions and long duration images and videos, most existing datasets cannot be used in training or evaluation. Hence, the team developed four new databases with high resolutions to train the model.

The team further revealed that they will pre-train the next version of NUWA-Infinity with more collected visual data and report its generalisation capabilities on open-domain inputs.

But the biggest catch is that NUWA Infinity can generate videos from text. It can generate unseen videos from a simple prompt. Also, it can generate videos from sketches. It can generate temporary consistent open domain videos.

Furthermore, it can also predict the next frames in a video. One can input an image and ask the machine to predict the future frames and NUWA Infinity will predict the future of the image, be it a landscape or the image of a human face.

Another catchy aspect of NUWA Infinity is that it is able to generate images with resolution as high as 38912 × 2048. Higher resolution not only implies more details, but also wider views.

(Image source: Microsoft)

(Image source: Microsoft)

How does it fare against its competitors?

Firstly, what sets NUWA-Infinity apart from its competitors is that it is designed to generate not only high-quality images but also videos from a given text, image, or video, something that neither of its competitors are capable of.

“Compared to DALL-E 2, Imagen and MidJourney, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation”, says Microsoft.

DALL-E 2 generates image embedding from an input text based on either an autoregressive or a diffusion model and uses a diffusion model to produce the output image. Google’s Imagen uses a frozen large-scale pre-trained language model ‘T5-XXL’ to encode each input text and uses two diffusion models to generate high-resolution images based on the text embeddings. 

However, both of these diffusion-based text-to-image generation methods cannot support arbitrarily sized image generation, as the size of the output images is pre-defined before training and inference.

NUWA Infinity introduces the autoregressive over autoregressive mechanism into the generation procedure, which enables the capability of generating variable-size images and videos, Microsoft explained.

NUWA-Infinity has the ability to stretch images to create one with a larger size and resolution. The same is demonstrated by stretching the painting, ‘The Starry Night’ by artist Vincent van Gogh. The AI model is able to stress the image without compromising the image quality.

Original vs NUWA-Infinity generated

(Original artwork: Vincent van Gogh)

(Stretched image: NUWA-Infinity)

Furthermore, NUWA-Infinity is also capable of bringing static images to life with an overly realistic result. It is able to turn an image into a video and display eye-catching vividness.

(Still image)

(Moving image generated by NUWA-infinity)

When it comes to availability to the public, AI models like DALL-E 2 and Midjourney are available to the public under different pricings, however, NUWA Infinity is currently not available to the public. It is available to selected individuals and for research purposes only.

Google has decided against releasing Imagen to the public due to risks of misuse. Similarly, Meta’s Make-a-Scene would be open exclusively to specific AI artists. 

The internet loves AI image generators 

Recently, OpenAI, a company in which Microsoft has also invested in, announced that it would start selling DALL-E 2 to a million people on its waiting list. Even prior to this, users who had access to DALL-E 2 were using the AI to generate creative images through prompts and were posting them on social media.

Most recently, a TikTok user used the prompt ‘selfie at the end of the world’ on DALL-E 2 and posted the results creating a social media buzz. The results, however, could be unpleasant for some as it has an apocalyptic feel to it.

Max Woolf, Data Scientist at BuzzFeed, also took to Twitter recently to show off his experiment with DALL-E 2. Woolf used the prompt ‘Darth Vader wearing a tuxedo with his prom date in awkward prom photos’ and the results were fascinating, to say the least.

(Image source: Max Woolf)

Microsoft hopes that NUWA-Infinity would help visual content creators save time, cut costs, and increase productivity and creativity.

Download our Mobile App

Pritam Bordoloi
I have a keen interest in creative writing and artificial intelligence. As a journalist, I deep dive into the world of technology and analyse how it’s restructuring business models and reshaping society.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.