Listen to this story
In a surprising turn of events, OpenAI has decided to open-source its technology. The company had released the paper about Consistency Models last month and now have decided to open-source the technology.
Check out the GitHub repository here.
The paper by OpenAI talks about ‘Consistency Models’, a technique that might be the next big step in the AI art generation race that can make DALL-E stand apart from the crowd.
The fact that OpenAI has decided to open-source their technology is definitely surprising. The company that has been getting a lot of criticism for its closed-door policy is now making attempts to go-to the open-source route. Recently, the company also announced their bug bounty program for incentivising researchers for detecting and reporting bugs in the system.
Interestingly, ChatGPT still runs on open-source code. But the last open-source contribution by OpenAI was seven years ago. It seemed that the company only wanted to take advantage of the open-source community and not contribute anything back.
OpenAI is definitely making big steps to move beyond the competition. Just like the introduction of GPT-4 in chatbots, consistency models might be the next step for AI image generation. Plus, this would be one of the first steps for OpenAI to become open again.
Consistency models are types of generative models that are designed to enable one-step and few-step generation. The paper highlights how the application of consistency distillation method outperformed the diffusion model approach on various benchmarks.
Similar to diffusion models, consistency models allow zero-shot image editing applications such as colorization, inpainting, denoising, interpolation, and stroke-guided generation. Though diffusion models can be definitely good for making several variants of a single image. But that requires a lot of computation power from GPUs. Consistency models can enable image generation on a single device with quick results.
The authors of the papers are Ilya Sutskever, the mind behind ChatGPT, Yang Song, Mark Chen, and Prafulla Dhariwal, the researcher who wrote the 2021 paper – Diffusion models beat GANs on Image Synthesis.
Even though diffusion models had beat GANs in image, audio, and video generation, their iterative generation process by removal of noise step-by-step slowed down the sampling speed, capping the potential for real-time applications. Overcoming this, consistency models achieve high sample quality without the need for adversarial training, allowing for a fast one-step generation, or at max two-step.
This is achieved by training the model like diffusion models, by observing the noising process of the image. But this model can pick up the image at any level of obscuration, and generate a perfect image in a single step, even when it has been fed with missing information. So even if the image is very noisy, consistency models go straight to the final result. Quite brilliant!
Arguably, the image quality is not as amazing as diffusion models. But the speed and minimal compute required at which the model can generate images, in a single step instead of hundreds and thousands, is definitely an upside.