Listen to this story
Over the weekend, news emerged that a class action lawsuit had been filed against generative AI companies Stability AI, Midjourney, and image hosting platform DeviantArt. The lawsuit, filed by three artists, seeks compensation for damages caused by these companies along with an injunction to prevent ‘further harms’. This lawsuit is just the latest in a long line of protests by artists to not use uncredited art to train generative AI algorithms.
Notably, the team fighting the lawsuit is the same one that filed a class action lawsuit against GitHub Copilot along the same lines. Led by lawyer Matthew Butterick, the lawyers are on a mission to prevent the misuse of people’s hard work as training data for AI. The primary point of contention is use of the LAION-5B dataset, which contains billions of uncredited, copyrighted images.
On the other hand, a hopeful trend of companies beginning to compensate people for using their images in AI datasets has started to emerge. Shutterstock, the stock image marketplace platform, has taken a stance of supporting artists, with plans to pay artists for their contribution in training machine learning models.
In a world where a company like Shutterstock has proved that it is possible to compensate artists adequately for using their art in an AI dataset, there is no reason for Stability AI, Midjourney, and Deviantart to not do so. Let’s delve deeper into the intricacies of AI copyright laws.
Breaking down the lawsuit
To understand why this lawsuit was filed in the first place, we must first look at how modern-day generative AI creates its images. All the parties targeted in the lawsuit use the diffusion method to generate artwork from noise. Diffusion is also the method used by the algorithms to interpret what a source image from a dataset is, as it is a more ‘comprehensible’ method of storing the insights from a large dataset such as LAION-5B. Butterick said in his blog,
“These resulting images may or may not outwardly resemble the training images. Nevertheless, they are derived from copies of the training images, and compete with them in the marketplace.”
Simply put, generative AI uses diffusion to create a ‘noise cloud’ of data from its training images, the knowledge of which it uses to create images out of another ‘noise cloud’. The lawsuit compares this method to other methods of storing compressed data such as MP3 or JPEG files, thus making generative algorithms a ‘collage system’ of their training dataset.
Even Stable Diffusion’s latent space visualisation techniques aren’t enough to save their generations from being derivative works, as the lawsuit states that it is a more complex way of interpolating source images without adding anything to it. Moreover, the lawyers also argue that the conditioning process by using text is simply a ‘layer of magical misdirection’ that makes it harder for users to generate obvious copies of the training data.
However, many have discounted this argument for being reductionist, as seen by this post by Twitter user Daniel, who goes by ‘KeyTryer’ on the platform. He carefully breaks down how diffusion models ‘learn’ the concepts associated with images instead of storing a representation of these images in latent space and using it to interpolate between different images.
While the specifics of how diffusion models work are a bit different from what has been described in this blog post by Butterick, the argument still stands. Whether it’s paying companies like LAION to create datasets like Stability AI, or scraping artwork from those hosted on their website like DeviantArt—the fact is that artists should be compensated for enabling the use of their art to train AI.
Compensating artists for their work
Even as Stable Diffusion, Midjourney, and DeviantArt are developing projects based on the work of millions of artists, Shutterstock has found a way to right the scales for both parties. While other companies like Getty Images have banned the sale of AI art on their platforms, Shutterstock is looking to incentivise the creation of such services with due compensation.
When they partnered with Meta, Shutterstock announced that they would develop a system to compensate artists whose work has been used to train an AI algorithm. In their words,
“Shutterstock is one of the first companies to pay artists for their contributions to training machine learning models, and it has proven to be a trusted partner to those entering the space by ensuring the responsible creation and licensing of content with a transparent IP transfer.”
With the rise of computer vision and generative AI, Shutterstock saw a business opportunity to provide datasets to companies. Otherwise known as ‘data deals’, this product aims to provide high-quality labelled data to companies building CV models. In July 2021, they announced the launch of Shutterstock.AI, a website focused solely on providing these datasets to AI researchers and companies.
The product was launched with an opt-out feature following closely behind, so as to allow certain contributors to exclude their content from the datasets, if they so prefer. They also established the Shutterstock Contributor Fund, which identified users whose content was included in the dataset and compensated them adequately.
Shutterstock’s responsible approach towards including and supporting the rise of AI is something other companies could take inspiration from. While the lawsuit currently stands on shaky ground from a scientific perspective, the precedent set by it will prove important to future generations of artists. If the artists win this lawsuit, it will establish an unequivocal legal precedent for generative AI companies in the future to adopt a measured approach towards art—including unlicensed and uncredited art—used in their datasets.