Adobe Unveils Continous 3D Words for Text-to-Image Control

The framework will allow users of text-to-image models to have fine-grained control over various attributes in an image. 

Adobe Research and University of Oxford have come up with a new paper which introduces Continuous 3D Words, a method allowing users of text-to-image models to have fine-grained control over various attributes in an image.

 

Polestar-Mid-article-banner

By engineering special sets of input tokens, these attributes can be transformed continuously, enabling users to manipulate sliders for control alongside text prompts. The approach is demonstrated to provide continuous user control over 3D-aware attributes like illumination, bird wing orientation, dollyzoom effect, and object poses.

Why is this Development Important?

The current controls for image generation in diffusion models cannot recognise abstract, continuous attributes such as illumination direction or non-rigid shape changes.

The paper emphasises that while photography offers detailed control over composition and aesthetics, text prompts in text-to-image diffusion models are limited to high-level descriptions. On the other hand, 3D rendering engines allow precise control but are labor-intensive and require expertise. 

This work aims to combine the advantages of both by expanding the vocabulary of text-to-image models with samples generated from rendering engines, creating Continuous 3D Words to enable fine-grained control during image generation.

Training Method

The core of the approach involves learning a continuous vocabulary, facilitating easier association between different attribute values and allowing interpolation during inference. Two training strategies are proposed to prevent degenerate solutions and enable generalisation to new objects. 

The first strategy involves a two-stage training process, preventing the model from encoding each attribute value as a new object. The second strategy employs ControlNet with conditioned images to prevent overfitting to artificial backgrounds. The entire training process is carried out in a lightweight manner for efficiency.

📣 Want to advertise in AIM? Book here

Picture of Shritama Saha
Shritama Saha
Shritama (she/her) is a technology journalist at AIM who is passionate to explore generative AI with a special focus on big techs, database, healthcare, DE&I, hiring in tech and more.
Related Posts
AIM Print and TV
Don’t Miss the Next Big Shift in AI.
Get one year subscription for ₹5999
Download the easiest way to
stay informed