Advertisement

Lessons from YouTube for Gen AI Copyright Mess

During the recent Google I/O 2023 conference, Google introduced watermarking and metadata to all images generated by AI to promote transparency
Listen to this story

The recent Senate discussion around AI regulation has turned the spotlight on the most pressing issues of copyright and artist incentivisation. OpenAI chief executive Sam Altman testified before the US Congress about the potential dangers of AI and suggested the creation of a new government agency to licence large AI models, revoke permits for non-compliance and set safety protocols.  The major impetus behind this plea for regulation stems from concerns regarding copyright infringement.

This feels like déjà vu. In its early stages, YouTube faced legal challenges as content owners initiated disputes over copyright violations. However, Google intervened and handled the situation effectively. Likewise, today AI-generated content is facing opposition from Adobe, Shutterstock, Getty, as well as artists who see it as a threat to their intellectual property.

How was YouTube Born?

The brainchild of PayPal mafia Chad Hurley, Steve Chen, and Jawed Karim, the platform gained significant popularity and amassed over 2 billion monthly active users soon after its launch in 2005. Google recognised its enormous potential and acquired YouTube in 2006 for a remarkable $1.65 billion, making it their second largest acquisition at that time. This move, however, faced considerable backlash as Google’s own platform, Google Video, was struggling to keep up with YouTube’s rapid growth.

Entrepreneur Mark Cuban called Google’s move “crazy”, considering the legal risks associated with YouTube. At the time, YouTube held the majority of online video traffic with a substantial 46% share, while MySpace and Google Video trailed behind with 23% and 10%, respectively. Former Google CEO Eric Schmidt later admitted that YouTube’s actual value did not align with the hefty price tag that it came with at the time of the acquisition.

Fast forward to almost 20 years, YouTube is one of the biggest revenue generators for Google. Google is also planning to implement AI to enhance advertisements, which is one of their major resources to bank on and help YouTube content creators. 

How Google Saved the Day

In YouTube’s early days, there was no system to detect copyrighted content, allowing people to upload such content freely. This led to widespread copyright infringement and legal conflicts. Viacom sued YouTube, accusing the platform of copyright violations and claiming that YouTube made significant advertising revenue off its content.

YouTube settled a lawsuit with $50 million, establishing liability for copyright infringement. Google introduced Content ID to block or monetise copyrighted material. While not flawless, it helped protect YouTube from lawsuits and form partnerships for monetizing copyrighted content.

When Art Meets AI 

A similar intersection of technology and copyright law can now be seen in AI-generated content where various legal issues are cropping up, including infringement, rights of use, ownership of AI-generated works, unlicensed content in training data, and the unauthorised utilisation of copyrighted and trademarked works.

Getty Images sued Stable Diffusion, Stability AI, Midjourney, and DeviantArt for improper use of photos, copyright infringement, and trademark violations. Artists claim their artwork was unlawfully used to train models and create new images. The definition of a “derivative work” under intellectual property laws requires clarification by the legal system, with interpretations varying by jurisdiction. The fair use doctrine will likely influence the outcomes, allowing specific use of copyrighted material without explicit permission.

At the recent Google I/O 2023 conference, Google introduced watermarking and metadata to all images generated by AI to promote transparency, allowing users to distinguish between images, and enabling them to make informed decisions when interacting with AI-generated content. Google also has its own text-to-image diffusion model called Imagen which is not yet available for the public.

Getty Images chief Craig Peters has criticised the rush to commercialise AI art generators, citing the lack of consideration for legal and ethical issues. While acknowledging AI’s creative potential, Getty Images partnered with Bria to offer ethical AI-powered image editing tools, focusing on editing rather than generation. 

Midjourney CEO David Holz argues that if the content is generated solely based on text input, it does not qualify for copyright protection. Another stock image platform, Shutterstock, which had earlier banner AI-generated images, eventually collaborated with OpenAI to train and feed the algorithm and bring its own generative AI tools to users with DALL-E-2. 

Datasets to the Rescue

Ownership of the datasets enables control over their usage and protects against lawsuits. So, selling datasets safeguards them against legal action from artists and photographers who claim their work has been copied. Moreover, it contributes to enhancing the quality of AI-generated images, making them more appealing to users. However, selling datasets is not foolproof since AI models can still produce images similar to existing works. 

Shutterstock has also ventured into selling datasets of its images and videos for AI models. These datasets are categorised by theme or subject and encompass images, videos, and 3D models. Each dataset contains metadata, including keywords, descriptions, geo-location, and categories.

The motive behind Shutterstock’s dataset sales is to aid companies in building more advanced models for tasks like facial recognition, object detection, and image classification, as such models demand a substantial amount of training data. 

The sale of datasets is not exclusive to Shutterstock, as other companies such as Adobe, Getty Images, and iStockphoto are also involved in this practice. This trend reflects the growing interest within the tech industry to monetise content through dataset sales.

Getty, which has mostly been anti-AI generated content, is collaborating with NVIDIA to develop two gen AI models with NVIDIA Picasso. These models will undergo comprehensive training using Getty Images’ vast bank of fully licensed material. Moreover, the income generated through these models will be utilised to grant deserving content creators their rightful royalties.

In contrast to Shutterstock, Adobe takes a different approach by refraining from direct sales of datasets. Instead, they have developed Adobe Experience Platform, enabling enterprises to collect and analyse data and can be used to build custom datasets. 

Although the sale of datasets raises concerns regarding privacy and copyright, Shutterstock has implemented measures to address these issues. For instance, the company only sells anonymised datasets and requires purchasers to adhere to terms of service agreements that safeguard the privacy of contributors.

Dataset sales represent an emerging trend in the tech industry, which is expected to continue expanding in the future. As a prominent player in this field, Shutterstock is well-positioned to capitalise on this trend and its associated benefits.


Read more: Forget LLMs, Large Knowledge Models are The Future of AI Chatbots

Download our Mobile App

Shritama Saha
Shritama is a technology journalist who is keen to learn about AI and analytics play. A graduate in mass communication, she is passionate to explore the influence of data science on fashion, drug development, films, and art.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Is Sam Altman a Hypocrite? 

While on the one hand, Altman is advocating for the international community to build strong AI regulations, he is also worried when someone finally decides to regulate it