Why Are People Criticising GitHub Copilot?

Last week, Microsoft and OpenAI released the technical preview of GitHub Copilot, an AI-based assistant to help programmers write better codes. Not to be confused with autofill, this assistant takes context from the code being worked on to suggest successive lines of code and functions. Copilot is based on OpenAI Codex, an AI system trained on public source code. It works well with languages such as Python, TypeScript, Javascript, Ruby, and Go.

The announcement of GitHub Copilot kicked up a storm in social media. The initial reaction has been largely positive, with many people calling the coding assistant a game-changer.


However, few questions have been raised: Since the tool is trained on publicly available code repositories–most of which are licensed and under copyright protection–what happens when the tool reproduces these code snippets? Is it legal? Can parent organisations –Microsoft, OpenAI and GitHub– monetise this tool even if it is trained on free and open-source code?


Sign up for your weekly dose of what's up in emerging technology.

GitHub has said there is 0.1 percent chance of Copilot replicating the learned snippet of code verbatim. As one Twitter user pointed out, this could be a potential case for ‘code laundering’ for commercial use, which not only involves copying the content as is but also derivative works.

GitHub allows users to choose a license to protect their work. The most common licenses include MIT, Apache, and GPL. iIn which, GPL is a free, copyleft license offering protection against verbatim copying of the work and also necessitates distribution of derivative work under the same or equivalent terms.

A section of Twitterati has asked if Copilot is following the fair code usage terms.

To that end, GitHub CEO Nat Friedman wrote in a discussion thread on HackerNews: “In general: (1) training ML systems on public data is fair use (2) the output belongs to the operator, just like with a compiler. On the training question specifically, you can find OpenAI’s position, as submitted to the USPTO here: https://www.uspto.gov/sites/default/files/documents/OpenAI_R… We expect that IP and AI will be an interesting policy discussion around the world in the coming years, and we’re eager to participate!”

Support for Copilot

Neil Brown, a legal expert in the digital space, spoke about Copilot from an English law perspective. In his blog, Brown explained GitHub’s passage D4 of Terms of Service. As per this passage, GitHub can copy a user’s content to the database, create backups, show it to other users, parse a search engine, and analyse it on their servers. Brown writes: “The license is broadly worded, and I’m confident that there is scope for argument, but if it turns out that Github does not require a license for its activities then, in respect of the code hosted on Github, I suspect it could make a reasonable case that the mandatory license grant in its terms covers this as against the uploader.”

That said, GitHub also notes in the same passage that this license doesn’t grant permission to sell content, distribute it or use it outside the scope of the service.

Further, Julia Reda, a researcher and former Member of the European Parliament wrote a blog (leaning more towards the EU perspective) titled “GitHub Copilot is not infringing your copyright”. She makes her argument on two grounds:

Data and text mining: Merely scraping off code without the author’s consent, although worthy of criticism, is not a copyright-relevant act that requires permission.

Derivative works and machine-generated code: Putting machine-generated code under the purview of derivative works is ‘dangerous’. Firstly, this assumption suggests that even the smallest piece of excerpts could constitute copyright infringement. Secondly, the very premise of machines being capable enough to produce works is wrong and counterproductive.

The snippets of code appear verbatim when the developer has not provided sufficient context or when there is a universal solution to the problem, the GitHub blog claimed. Also, the GitHub team is building an origin tracker to detect such code duplication instances.

Wrapping up

GitHub in its blog clearly mentions that Copilot is to be seen strictly as an AI pair programmer to assist in writing codes. The coders who got access to the tool echoed similar sentiments claiming while Copilot is impressive, it cannot be equated with human programmers. According to blogger Colin Eberhardt, the Copilot has the “wow” factor to make it to the standard toolset of enterprises. However, he thinks it will take some time for the coding assistant to deliver a genuine productivity boost. 

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM