Users can now Generate Text-to-3D models using 2D Diffusion

The new method called ‘DreamFusion’ uses 2D Diffusion to generate diverse 3D models, bringing advancements to text-to-image synthesis.
Listen to this story

A group of researchers from Google have adapted a new approach to 3D synthesis. Users can now generate 3D models with text prompts as input. The new method, called ‘DreamFusion’, uses 2D Diffusion and is set to bring notable advancements to text-to-image synthesis. 

Typically, advancements in AI generative systems are driven by diffusion models that are trained on billions of image-text pairs. Researchers claim that such an adaptation of 3D model synthesis would require large-scale datasets of labelled 3D assets and efficient architectures for de-noising 3D data—neither of which currently exist. Instead, the team circumvented such limitations to use a pre-trained 2D text-to-image diffusion model that performs text-to-3D synthesis

The researchers optimised a randomly-initialised 3D model called ‘NeRF’ (a Neural Radiance Field) via gradient descent so that the renderings in 2D from random angles achieve lesser loss.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

An excerpt from the blog says, “The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.”

How does it work?

A text-to-image generative model called ‘Imagen’ is used to optimise a 3D scene. The research also proposes Score Distillation Sampling (SDS)—a way to generate samples from a diffusion model by optimising a loss function—allowing users to optimise samples in an arbitrary parameter (3D) space. 

A 3D scene parameterization, similar to Neural Radiance Fields or NeRFs, is used to define the differentiable mapping. While SDS produces reasonable scene appearance, DreamFusion instils additional regularisers and optimisation strategies to improve geometry. The resultant trained NeRFs are coherent—with surface geometry and high-quality normals.  

Bhuvana Kamath
I am fascinated by technology and AI’s implementation in today’s dynamic world. Being a technophile, I am keen on exploring the ever-evolving trends around applied science and innovation.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR