MITB Banner

Why OpenAI Launched Copyright Shield? 

Copyright Shield aims to provide financial support and legal defense to the enterprise-level users of ChatGPT against copyright issues. But for how long?

Share

Listen to this story

OpenAI is well known in the tech ecosystem for copyright infringement lawsuits for training its language models like GPT-4 and DALL.E 3. At the first-ever DevDay, the AI research lab launched the Copyright Shield program, which aims to provide financial support and legal defense to the enterprise-level users of ChatGPT against such claims.

While unveiling the program, Sam Altman emphasised their efforts to ensure copyright compliance within their AI systems, which are trained on a combination of licensed and publicly available data sources.

With this initiative, OpenAI aligns itself with tech giants like Microsoft, Amazon, and Google, all of which offer legal aid to their users facing similar issues. Adobe and Shutterstock, known for their stock images and generative AI tools, have also pledged to offer comparable protections.

Following the Lead

In recent months, major tech companies have proactively tackled the copyright issues associated with generative AI tools. Back in September, Microsoft introduced the Copilot Copyright Commitment program to cover legal costs for customers of its AI services, including Microsoft 365 Copilot and GitHub Copilot, provided they adhere to guidelines like using content filters.

Adobe, too, has set up a safeguard for its AI art tool, Firefly, offering support against copyright claims and ensuring that the images are either licensed or public domain.

Google has stepped up by offering to defend Google Cloud and Workspace users against IP infringement claims related to both the training materials and the AI-generated content, though this does not cover misuse. These initiatives by the tech giants aim to navigate the legal complexities of AI content creation and offer some security to users.

Amazon has opted for a different approach with its Kindle Direct Publishing platform, requiring authors to disclose if their content is AI-generated—though this does not extend to AI-assisted edits. This policy seeks to ensure transparency about the content’s origins rather than offering legal protection.

OpenAI’s History of Lawsuits

Prominent authors, including George R.R. Martin, the creator of Game of Thrones and Pulitzer prize winner Michael Chabon, have filed lawsuits against OpenAI, alleging unauthorised use of their works in training AI programs like ChatGPT. However, the ChatGPT maker contends that their methods are covered by fair use—a claim not recognised by the authors, leading to ongoing legal battles.

Further, OpenAI has faced additional lawsuits for allegedly using private data without permission and for systematic copyright infringement, as claimed by the Author’s Guild.

Elon Musk’s recent unveiling of the xAI chatbot, Grok. The most interesting as well as concerning thing about this is that the 33 billion parameter model was developed by a small team of 16 members in less than just four months, in contrast to Google’s Bard, which took about two years, and OpenAI’s ChatGPT, which took several years, raising questions about the authenticity and IP rights of the training data.

One of Grok’s distinct advantages is its access to real-time data from the exclusive X platform. This is significant, particularly after Musk restricted free API access to X to prevent data scraping for training competing models, highlighting the increasing value and protectiveness of proprietary data in AI.

Yet, the scarcity of clean, licensed data for AI training is the root cause which no big tech wants to address. While coding platform Replit is one of the very few companies to openly state that it used 1 trillion licensed code tokens from the Stack dataset and StackExchange to train their AI chatbot Replit AI, other companies’ transparency levels vary, raising questions about the purity of their data sources.

Further complicating the landscape are the ever-growing jailbreakers, adept at bypassing the constraints placed on AI tools to access or repurpose their capabilities, often pushing the boundaries of usage policies. Their actions can intensify the data-sourcing problem, as they might use methods that compromise the integrity or legality of the data used to train other models.

Meanwhile, several users took to X to criticise measures such as OpenAI’s copyright shield, which paradoxically restricts the use of its output to train other models, as per their official website. This has sparked debates over the ethics of copyright in the AI domain, as it may improperly attribute rights to those who did not contribute to the original creation.

The current race for AI dominance is now not only about technological prowess but also about securing exclusive data, with Google utilising YouTube, Poe tapping Quora, and OpenAI drawing from web data only up until January 2022. The struggle extends to maintaining control over data in a rapidly evolving field where jailbreakers and AI companies vie for the upper hand.

Read more: Now Everyone is an App Developer, Thanks to OpenAI 

Share
Picture of Shritama Saha

Shritama Saha

Shritama (she/her) is a technology journalist at AIM who is passionate to explore the influence of AI on different domains including fashion, healthcare and banks.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.