Listen to this story
|
Google has introduced a new method called StyleDrop, which allows the synthesis of images in a specific style using the Muse text-image model. The technique captures the details of a custom style, including color schemes, shading, design patterns, and local and global effects, requiring only a single image as input.
StyleDrop rapidly learns the new style by fine-tuning a small number of trainable network parameters. It then improves the model’s quality through iterative training, either with human feedback or automatic feedback. The process is efficient, taking less than three minutes even with human feedback, as StyleDrop requires only a few images for iterative training.
According to Google, StyleDrop outperforms other methods for style transfer from text-to-image models, surpassing techniques such as Dreambooth, LoRAs, Textual Inversion in Imagen, and Stable Diffusion.
The team at Google combines StyleDrop with Dreambooth to learn and create new objects in different styles. By utilising Muse, they can generate custom objects in custom styles.
Google envisions StyleDrop as a versatile tool that allows designers or companies to train with their brand assets and rapidly prototype new ideas in their desired style. The project page for StyleDrop provides further information on its capabilities and applications.
Google had previously introduced an AI model called Imagen, which is a text-to-image diffusion model. Unlike other AI text-to-image generators, Imagen had specific limitations in its initial release. Users could generate buildings with different themes or style animated creatures using Imagen AI. The model had been released through the AI Test Kitchen app, where Google tests various AI projects before public release. Imagen utilises the LAION-400M dataset for training.
Google’s cautious release of Imagen and its focus on photorealistic outputs distinguished it from other models like DALL-E or Midjourney. Imagen’s unique features include City Dreamer, which allows users to construct buildings based on text descriptions, and Wobble, which generates animated creatures.
While generative AI tools like this offer creative possibilities, concerns remain regarding provenance and copyright issues in content creation using AI. The use of internet-posted text to train AI text generation tools has seen less debate compared to image-based tools, while AI tools for music production are also likely to raise similar questions.