Users can now Generate Text-to-3D models using 2D Diffusion

The new method called ‘DreamFusion’ uses 2D Diffusion to generate diverse 3D models, bringing advancements to text-to-image synthesis.
Listen to this story

A group of researchers from Google have adapted a new approach to 3D synthesis. Users can now generate 3D models with text prompts as input. The new method, called ‘DreamFusion’, uses 2D Diffusion and is set to bring notable advancements to text-to-image synthesis. 

Typically, advancements in AI generative systems are driven by diffusion models that are trained on billions of image-text pairs. Researchers claim that such an adaptation of 3D model synthesis would require large-scale datasets of labelled 3D assets and efficient architectures for de-noising 3D data—neither of which currently exist. Instead, the team circumvented such limitations to use a pre-trained 2D text-to-image diffusion model that performs text-to-3D synthesis

The researchers optimised a randomly-initialised 3D model called ‘NeRF’ (a Neural Radiance Field) via gradient descent so that the renderings in 2D from random angles achieve lesser loss.


Sign up for your weekly dose of what's up in emerging technology.

An excerpt from the blog says, “The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.”

How does it work?

Download our Mobile App

A text-to-image generative model called ‘Imagen’ is used to optimise a 3D scene. The research also proposes Score Distillation Sampling (SDS)—a way to generate samples from a diffusion model by optimising a loss function—allowing users to optimise samples in an arbitrary parameter (3D) space. 

A 3D scene parameterization, similar to Neural Radiance Fields or NeRFs, is used to define the differentiable mapping. While SDS produces reasonable scene appearance, DreamFusion instils additional regularisers and optimisation strategies to improve geometry. The resultant trained NeRFs are coherent—with surface geometry and high-quality normals.  

More Great AIM Stories

Bhuvana Kamath
I am fascinated by technology and AI’s implementation in today’s dynamic world. Being a technophile, I am keen on exploring the ever-evolving trends around applied science and innovation.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox