MITB Banner

Stability AI Releases Stable Diffusion 2.0

The company claims that the second version of Stable Diffusion is the foundation to create new applications, leading to an endless explosion of the creative potential in AI.
Share
Listen to this story

Since 2022 was the year of generative AI, users were able to generate from text to anything. The year isn’t even over yet, and Stability AI has announced the open-source release of Stable Diffusion 2.0 on Thursday. 

Recently, NVIDIA entered the generative arena with its text-to-image model called ‘eDiffi’ or ‘ensemble diffusion for images’ in competition with Google’s Imagen, Meta’s ‘Make a Scene’ and others.

eDiffi offers an unprecedented text-to-image synthesis with intuitive painting along with word capabilities and instant style transfer—compared to the open-source text-to-image DALL.E 2 and Stable Diffusion—generating results with better synthesis quality.

Moreover, Amazon Web Services (AWS) is set to begin offering access to generative algorithms Bloom and Stable Diffusion in Sagemaker Jumpstart—the company’s service for open-source, and deployment-ready algorithms—which is well known in the generative AI space. 

Read: Meet the Hot Cousin of Stable Diffusion, ‘Unstable Diffusion’

What’s new?

Stable Diffusion 2.0 delivers better features and improvements, compared to the original V1 release. The new release includes robust text-to-image models that are trained on a new encoder called OpenCLIP—developed by LAION and aid from Stability AI—improving quality of the generated images. The models in the release are capable of generating images with default resolutions of 512×512 pixels and 768×768 pixels. 

Check out the release notes of Stable Diffusion 2.0 on GitHub

Moreover, the models are trained on the subset of the LAION-5b dataset—which was created by the DeepFloyd team—to further filter adult content using the dataset’s NSFW filter. 

Images produced using Stable Diffusion 2.0 (768×768 image resolution)

Unveiling Upscaler Diffusion Models  

Other than the generative model, the 2.0 version includes an Upscaler Diffusion model which enhances image resolution or quality by a factor of 4. For example, below is an image where a low-resolution generated image (128×128) has been upscaled to a higher resolution (512×512). 

The company said that by combining this model with their text-to-image models, Stable Diffusion 2.0 will be able to generate images with resolutions of 2048×2048, or even higher.

Source: (Left) Low-resolution image (128×128); (Right) High resolution image (512×512) produced by Upscaler Diffusion Model

Moreover, the depth-to-image model—depth2img—extends all prior features of image-to-image option from the V1 for creative applications of the model. 

With the help of the existing model, this feature would infer the depth of an input image to generate new images using text and depth information. 

Input image used to produce several new images

With the help of a depth-to-image model, users can apply the feature in new creative applications, delivering results that may look different from the original—still preserving the depth and coherence of the original image.

Depth-to-Image preserves depth and coherence of the original image

With features such as upscaling capabilities to higher resolution and depth2img, Stability AI believes that the 2.0 version would be the foundation to create new applications—enabling an endless explosion of the creative potential in AI. 

Why it matters

The original CompVis-led Stable Diffusion V1 had changed the nature of open-source AI generative models, spawning over hundreds of other innovations around the world. With one of the fastest growing climbs to 10,000 GitHub stars in any software, the V1 rocketed through 33K stars in less than two months.

Source: A16z and Github

The original Stable Diffusion V1 release was led by the dynamic team of Stability AI’s Robin Rombach, Patrick Esser from Runway ML and CompVis Group of LMU Munich’s Professor Dr. Björn Ommer. The older version was built on their prior work along with Latent Diffusion Models—receiving support from Eleuther AI and LAION. The blog read that Rombach is now leading the advancements with Katherine Crowson to create the next generation of media models.

PS: The story was written using a keyboard.
Share
Picture of Bhuvana Kamath

Bhuvana Kamath

I am fascinated by technology and AI’s implementation in today’s dynamic world. Being a technophile, I am keen on exploring the ever-evolving trends around applied science and innovation.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India