Listen to this story
Diffusion Models were introduced in the field of AI in 2015. Jascha Sohl-Dickstein, finding inspiration from the physics principles of non-equilibrium thermodynamics, developed the technique for generating images that outperformed GANs in terms of quality and speed. This gave rise to the successful text-to-image models like DALL.E2 and Stable Diffusion.
Sander Dieleman, Research Scientist at DeepMind, recently published a blog exploring if diffusion models can also be used for language tasks. He perfectly iterates the part about how we proceed with language in real life—starting with a basic concept in mind about a topic and then writing words, phrasing, and structuring comes later. This concept, interestingly, looks similar to how diffusion models work and the approach can definitely fill the gaps of the current auto-regressive models, though they have already set a tough baseline to beat.
Large language models like GPT-3, BERT, and PALM perform auto-regression for generating texts. These models work by predicting the future values based on the past values, meaning they produce a sequence of outputs based on the previously generated outputs and inputs, token by token. This is very similar to how we produce and consume language, word by word.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
In 2022, a few research papers took the diffusion model approach for improving language models. AIM got in touch with one of the researchers of DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models, Shansan Gong, to understand about the progress of this field. Published in October, the paper iterates that the DiffuSeq approach competes with existing auto-regressive and large pre-trained models in terms of quality and diversity. The paper is also submitted for ICLR 2023.
Gong believes that researching the use of diffusion models in the field of text generation is definitely worth it. The researchers also open-sourced the model in December. You can check out the GitHub repository here.
Gong said that the non-autoregressive method is more flexible to conduct self-edit (just like inpainting features in diffusion models). This approach also allows the generation process to be parallelised and not token-by-token. “DiffuSeq addresses the much needed aspect of diversity that is desired in many Seq2Seq tasks,” he added.
Experimenting with text using diffusion models is not new though. In April 2021, researchers from Seoul National University used the denoising diffusion model for text-to-speech, called Diff-TTS. The recent model built on Stable Diffusion, Riffusion, that generates music by text prompts also took a very similar approach.
Researchers from Stanford University were the first to explore the field of text generation. They published the paper, Diffusion-LM: Improves Controllable Text Generation, in May and open-sourced it in June, laying the groundwork for further research. The paper proposed the non-autoregressive language model based on continuous diffusion which was further improved by Gong and team in October with DiffuSeq.
The researchers of Diffusion-LM concluded that the method was successful in 6-fine grained control tasks and doubled the control success rate when compared to prior methods without the need for additional training, like other fine-tuning methods.
In October, another paper, SSD-LM, proposed a semi-autoregressive language model for text-generation and modular control. This was followed by DiffusionBERT, which combined pre-trained language models with discrete diffusion for text, improving on training on discrete data.
This was recently explored further by researchers at Cornell University. Latent Diffusion for Language Generation, demonstrates that continuous diffusion models can be learned in the latent space. The model is trained to denoise continuously and iteratively at any time and space.
Why Bother with Diffusion?
Diffusion models became the paradigm for generative models in 2022. Unlike autoregressive models that require restricted connectivity patterns for ensuring causality, diffusion models are unconstrained and thus allow more creative freedom. But, given the efficiency of current autoregressive models (example ChatGPT), is researching diffusion models in the language field even worth it?
Dieleman published his paper about Continuous diffusion for categorical data proposing a framework for categorical data both in time and space that demonstrated many capabilities in language tasks as well. He also said that his models produce reasonable samples and are comparatively easy to scale due to their similarity to existing language models; they fall behind on the efficiency of autoregression models.
Dieleman believes that though the research in the field is increasing, “It is still too early to consider diffusion as a serious alternative to autoregression for generative language modelling at scale”. At the same time, he is also convinced that progress in the field can be achieved by iterative refinement, which means the model should be applied repeatedly to the canvas instead of a single layer to refine the canvas with a deeper computation graph, which is similar to the approach taken by the researchers of Cornell University.
The field is underexplored but the same was the case when diffusion was tried for image generation. “In the era of large language models, diffusion models can be part of it and work for many downstream tasks, and are definitely worth a shot,” said Gong.