Listen to this story
We live in the times of text-to-image AI tools that are available aplenty. And now with the introduction of Phraser – the world’s first-ever application that employs machine learning to help users write prompts for neural networks, the job gets even easier.
Denis Shilo, CEO of Facel, developed Phraser with the goal of promoting smart search. The main features of Phraser include simple steps like choosing a style, selecting the content type, picking the quality of colour, adjusting the camera settings, etc.
Sign up for your weekly dose of what's up in emerging technology.
What makes this smart search feature exciting is the effortlessness in allowing users to search directly through prompts, eliminating the fuss of keywords and other procedures. It operates on a million imagery databases, previously developed through Midjourney, DALLE-2 and Stable Diffusion( text-to-image models). Developers perceive this tool as economical and time-saving, as users can instantly check how different keywords, functions and styles are now added to the prompt editor.
How did neural networks (Stable Diffusion) work before Phraser?
Image synthesis models (ISMs) use a technique known as latent diffusion. Primarily, the model learns to identify familiar shapes amid the noise and fetches those elements into central focus if they sink with the words in the prompt.
To begin this process, a person or group instructing the model assembles the images with metadata (including all captions and tags on the web), thus forming an extensive database. In case of Stable Diffusion, Stability AI uses a combination of the LAION-5 B image set, which is based on a scrape of 5 billion publicly available images over the web. According to recent research, a significant portion of such images come from sites such as Pinterest, Getty Images, or Devian Art. Therefore, Stable Diffusion adopts the styles of multiple living artists.
Another step would require model training on the image data set from the pool of hundreds of high-end GPUs such as the Nvidia A100. According to Emad Mostaque, founder of Stability AI, the training cost for Stable Diffusion is around $660,000. During the training period, the model co-relates words with images with the help of a technique known as CLIP (Contrastive Language–Image Pre-training), created by Open AI last year.
At this point, Stable Diffusion doesn’t care if a person has four arms, six heads, or seven fingers, as long as one is a pro at generating text prompts, which is even referred to as prompt engineering by AI artists. You may need to develop lots of images and cherry-pick the good ones. Remember that the more a prompt gets in sync with captions for familiar images in the data set, the more impressive the results will be. And Phraser is easing the interface of all such neural networks through its ease of writing prompts.
With the involvement of Phraser, you simply need to push the Stable Diffusion button on the first screen, and Phraser will do the rest. In addition, the creators have also removed the language barrier, thus allowing one to use prompt search in five languages.
Scenario after Phraser
Phraser is expected to enhance the existing attributes of these text-to-image networks; it would enrich Midjourney’s artistic ability and DALLE-2s ability to create more realistic images with prompts.