Last updated January 2, 2024
In AI Trends & Future

NYT, Doomed to Fail like its Predecessors?

There have been similar copyright cases in the past against OpenAI with no resolution. Could the NYT case be different?

Share

Illustration by Nikhil Kumar

Published on December 30, 2023

by K L Krithika

The New York Times is pushing harder on its copyright lawsuit against OpenAI. Since the time the case was announced, the New York Times has accumulated more evidence against OpenAI. This could very well be the first copyright infringement that could hold ground in court. But the outcome is yet to be seen.

The lawsuit, which was filed in Manhattan federal court on Wednesday, 27th December, claims that millions of its copyrighted articles were used without permission to train ChatGPT models.

NYT alleges that this practice not only infringes upon their copyright but also positions these AI models as direct competitors to their own journalistic offerings. While previously it was hard to substantiate these claims, the visual evidence offered by the plaintiffs is damning.

Jason Kint, the CEO of Digital Next posted a thread on X explaining the case and said, “I find this exhibit to be an incredibly powerful illustration for a lawsuit that will go before a jury of Americans. Again, it’s impossible to argue with this.” referring to the above snapshots of the similar text generated by ChatGPT as the articles on the news site.

The Times had previously approached Microsoft and OpenAI in April, raising these concerns and to possibly come up with an ‘amicable’ resolution to reach a commercial agreement and add technological guardrails around the website but that didn’t reach a conclusion.

Following this The New York Times changed their policy in August right before they announced their intention to sue OpenAI. They updated their Terms of Service on August 3rd, restricting the use of its content for AI training.

This includes all forms of content, and explicitly bans automated data collection tools without written permission. Additionally, OpenAI and Microsoft have introduced similar measures for their services, where sites can block crawlers for scraping from their websites, reflecting a broader industry trend towards more controlled use of web-sourced data for AI development.

A stronger case this time

The major issue with filing the copyright cases against AI is the plaintiff’s inability to prove that the original works of the AI models like ChatGPT learn from vast datasets, making it hard to trace the origin of every piece of generated content back to a specific source. This creates a legal gray area, if an AI produces content similar to a copyrighted work, is it infringement or just a coincidence?

This question has been asked multiple times, drawing similarities on human inspiration which can be from any work of art.

In this case, NYT alleges that OpenAI engaged in 4 types of unauthorized copying of its articles, the training datasets, the LLMs encode copies in their parameters, the output of memorized articles in response to queries and the output of articles using browsing plugin.

A thread on some misconceptions about the NYT lawsuit against OpenAI. Morality aside, the legal issues are far from clear cut. Gen AI makes an end run around copyright and IMO this can't be fully resolved by the courts alone. (HT @sayashk @CitpMihir for helpful discussions.)
— Arvind Narayanan (@random_walker) December 29, 2023

This case, given the evidence and the profile of the plaintiff, might not only differ from its predecessors but could also set a significant precedent for future legal challenges in this domain, due to the characteristics of LLMs. They generate content not by retrieving stored data but by mimicking patterns from a vast training corpus, a process termed as “approximate retrieval.” This means the output is not a direct copy, but can closely mirror the style and structure of the texts in the training set.

This is interesting because there is no information in the lawsuit on what prompt was used to get this output. And it would be hard to recreate a similar response in court as one user pointed out.

An alternative and perhaps more collaborative approach could emerge from these challenges. Like the ones we’ve seen with OpenAI and Associated Press, Axel Springer.

Such partnerships could involve licensing agreements, where content creators are compensated for their contributions to AI training datasets. This approach not only addresses copyright concerns but also fosters a symbiotic relationship where both parties benefit – content creators receive compensation and recognition, while AI developers gain access to high-quality, legitimate datasets.

The outcome of the NYT lawsuit against OpenAI could influence the future interactions between AI companies and content creators. Whether this leads to more legal battles or paves the way for collaborative partnerships will depend significantly on how both industries navigate this complex and evolving landscape. Either way, the implications for the future of content creation, particularly in journalism, are profound and worth watching closely.

Access all our open Survey & Awards Nomination forms in one place

K L Krithika

K L Krithika is a tech journalist at AIM. Apart from writing tech news, she enjoys reading sci-fi and pondering the impossible technologies, trying not to confuse it with reality.

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

‘iPhone is the Greatest Piece of Technology Humanity has Ever Made,’ Says OpenAI’s Sam Altman

Siddharth Jindal

“There are a bunch of societal and interpersonal issues that are all very complicated about wearing a computer on your face,” says OpenAI chief, taking a dig at Meta smart