Last updated November 8, 2023
In AI Origins & Evolution

Why OpenAI Launched Copyright Shield?

Copyright Shield aims to provide financial support and legal defense to the enterprise-level users of ChatGPT against copyright issues. But for how long?

Share

Published on November 8, 2023

by Shritama Saha

Listen to this story

OpenAI is well known in the tech ecosystem for copyright infringement lawsuits for training its language models like GPT-4 and DALL.E 3. At the first-ever DevDay, the AI research lab launched the Copyright Shield program, which aims to provide financial support and legal defense to the enterprise-level users of ChatGPT against such claims.

While unveiling the program, Sam Altman emphasised their efforts to ensure copyright compliance within their AI systems, which are trained on a combination of licensed and publicly available data sources.

With this initiative, OpenAI aligns itself with tech giants like Microsoft, Amazon, and Google, all of which offer legal aid to their users facing similar issues. Adobe and Shutterstock, known for their stock images and generative AI tools, have also pledged to offer comparable protections.

Following the Lead

In recent months, major tech companies have proactively tackled the copyright issues associated with generative AI tools. Back in September, Microsoft introduced the Copilot Copyright Commitment program to cover legal costs for customers of its AI services, including Microsoft 365 Copilot and GitHub Copilot, provided they adhere to guidelines like using content filters.

Adobe, too, has set up a safeguard for its AI art tool, Firefly, offering support against copyright claims and ensuring that the images are either licensed or public domain.

Google has stepped up by offering to defend Google Cloud and Workspace users against IP infringement claims related to both the training materials and the AI-generated content, though this does not cover misuse. These initiatives by the tech giants aim to navigate the legal complexities of AI content creation and offer some security to users.

Amazon has opted for a different approach with its Kindle Direct Publishing platform, requiring authors to disclose if their content is AI-generated—though this does not extend to AI-assisted edits. This policy seeks to ensure transparency about the content’s origins rather than offering legal protection.

OpenAI’s History of Lawsuits

Prominent authors, including George R.R. Martin, the creator of Game of Thrones and Pulitzer prize winner Michael Chabon, have filed lawsuits against OpenAI, alleging unauthorised use of their works in training AI programs like ChatGPT. However, the ChatGPT maker contends that their methods are covered by fair use—a claim not recognised by the authors, leading to ongoing legal battles.

Further, OpenAI has faced additional lawsuits for allegedly using private data without permission and for systematic copyright infringement, as claimed by the Author’s Guild.

Copyright for Big Tech is Fluff

Elon Musk’s recent unveiling of the xAI chatbot, Grok. The most interesting as well as concerning thing about this is that the 33 billion parameter model was developed by a small team of 16 members in less than just four months, in contrast to Google’s Bard, which took about two years, and OpenAI’s ChatGPT, which took several years, raising questions about the authenticity and IP rights of the training data.

One of Grok’s distinct advantages is its access to real-time data from the exclusive X platform. This is significant, particularly after Musk restricted free API access to X to prevent data scraping for training competing models, highlighting the increasing value and protectiveness of proprietary data in AI.

Yet, the scarcity of clean, licensed data for AI training is the root cause which no big tech wants to address. While coding platform Replit is one of the very few companies to openly state that it used 1 trillion licensed code tokens from the Stack dataset and StackExchange to train their AI chatbot Replit AI, other companies’ transparency levels vary, raising questions about the purity of their data sources.

Further complicating the landscape are the ever-growing jailbreakers, adept at bypassing the constraints placed on AI tools to access or repurpose their capabilities, often pushing the boundaries of usage policies. Their actions can intensify the data-sourcing problem, as they might use methods that compromise the integrity or legality of the data used to train other models.

This "copyright shield" is outrageous! And at the same time OpenAI prohibits using the output of ChatGPT to train competing models. It's data laundering and misplacing copyright to people who didn't create anything. pic.twitter.com/x8eR5vyDQA
— Marge Nelk (@NelkMarge) November 6, 2023

Meanwhile, several users took to X to criticise measures such as OpenAI’s copyright shield, which paradoxically restricts the use of its output to train other models, as per their official website. This has sparked debates over the ethics of copyright in the AI domain, as it may improperly attribute rights to those who did not contribute to the original creation.

The current race for AI dominance is now not only about technological prowess but also about securing exclusive data, with Google utilising YouTube, Poe tapping Quora, and OpenAI drawing from web data only up until January 2022. The struggle extends to maintaining control over data in a rapidly evolving field where jailbreakers and AI companies vie for the upper hand.

Access all our open Survey & Awards Nomination forms in one place

Shritama Saha

Shritama (she/her) is a technology journalist at AIM who is passionate to explore the influence of AI on different domains including fashion, healthcare and banks.