MITB Banner

GitHub Copilot: The Latest in the List of AI Generative Models Facing Copyright Allegations

GitHub Copilot, the text-to-code AI tool, has been facing accusations of stealing people’s codes. So, what’s next for AI-generative models?

Share

Listen to this story

GitHub Copilot, the text-to-code AI tool, has been—for the most part—revolutionary in determining how people code. Twitter has been erupting with people expressing how this new AI tool has benefitted them with organisation heads and developers alike hailing it for saving much of their time. 

However, the latest discussion surrounding it suggests that things are murky. 

Tim Davis, Professor – Computer Science, Texas A&M University, took to Twitter to express his resentment over Copilot producing his copyright code for a particular prompt. 

Chris Rackauckas, lead developer of SciML, also shared a thread of Armin Ronacher from July 2021, adding, “Github Copilot spits out the Quake source code. It just repeats its training data often, even without OSS licenses”. 

But beyond this, the latest news that has been making rounds is about Matthew Butterick, a writer, programmer, and lawyer, who announced on October 17 that he would be teaming up with Joseph Saveri Law Firm, investigating a potential lawsuit against GitHub Copilot on the grounds of violating open-source licences. In writing on the issue of copyright violation in June 2022, Butterick cautioned organisations creating software products against the use of Copilot, as they would be taking part in using someone else’s intellectual property, albeit unintentionally. 

GitHub is trained upon billions of lines of public code. But, there is no surety over whether the training data comes as fair use under copyright law. In presenting the case, Butterick writes that Microsoft characterises Copilot’s output code as only a series of “suggestions” and does not claim any rights over it. Additionally, he also cites a passage from GitHub’s website showing how Microsoft plays safe by pushing the blame onto the end user:  

“You are respon­si­ble for ensur­ing the secu­rity and qual­ity of your code. We rec­om­mend you take the same pre­cau­tions when using code gen­er­ated by GitHub Copi­lot that you would when using any code you didn’t write your­self. These pre­cau­tions include rig­or­ous test­ing, IP [(= intel­lec­tual prop­erty)] scan­ning, and track­ing for secu­rity vul­ner­a­bil­i­ties.”

In a recent statement, Open AI claimed that the training material from public repositories is not meant to be included in the output generated by Copilot. Additionally, their analysis has shown that a vast majority of the output (>90%) doesn’t match the training data.  

There is a divided opinion (a grey area, if you will) about who “legally” stands right among the two parties. GitHub has made it clear that the users need to check if the code used is free of copyright infringement, but at the same time, the open-source communities see the whole facade of “AI training is fair use” for their copyrighted codes to be a disregard for their rights. See, for example, this statement by Butterick: “By claim­ing that AI train­ing is fair use, Microsoft is con­struct­ing a jus­ti­fi­ca­tion for train­ing on pub­lic code any­where on the inter­net, not just GitHub.”   

Hence, there is little clarity over who is to be held accountable for this—Is it Copilot or the end users employing the AI-generated code for their product? 

GitHub’s claim that AI training comes under fair use needs more inspection. This is not the first time questions of copyright have sprung forth in AI applications. It has been a persistent issue throughout the recent surge in AI generative models. 

In an interview with Ben Sobel by IPW in 2017, Sobel explains the problem as a “fair use dilemma”. His argument goes like this: 

(i) If Machine Learning doesn’t come under fair use, then organisations have to pay remedies to millions who form the training data on which machines learn. This will hinder any progress in the field. 

(ii) But, if it does come under fair use, it is likely that organisations will take liberty in using the intellectual labour of people for their own profit.    

Therefore, it will not be a stretch to say that the legal aspect of AI use is in difficult terrain. If there is a case for Butterick to take the makers of Copilot to court, the outcome of the lawsuit will have a huge impact on the future of open-source communities and AI generation models. 

Share
Picture of Ayush Jain

Ayush Jain

Ayush is interested in knowing how technology shapes and defines our culture, and our understanding of the world. He believes in exploring reality at the intersections of technology and art, science, and politics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.