The announcement of GitHub Copilot kicked up a storm in social media. The initial reaction has been largely positive, with many people calling the coding assistant a game-changer.
Sign up for your weekly dose of what's up in emerging technology.
However, few questions have been raised: Since the tool is trained on publicly available code repositories–most of which are licensed and under copyright protection–what happens when the tool reproduces these code snippets? Is it legal? Can parent organisations –Microsoft, OpenAI and GitHub– monetise this tool even if it is trained on free and open-source code?
GitHub has said there is 0.1 percent chance of Copilot replicating the learned snippet of code verbatim. As one Twitter user pointed out, this could be a potential case for ‘code laundering’ for commercial use, which not only involves copying the content as is but also derivative works.
GitHub allows users to choose a license to protect their work. The most common licenses include MIT, Apache, and GPL. iIn which, GPL is a free, copyleft license offering protection against verbatim copying of the work and also necessitates distribution of derivative work under the same or equivalent terms.
A section of Twitterati has asked if Copilot is following the fair code usage terms.
To that end, GitHub CEO Nat Friedman wrote in a discussion thread on HackerNews: “In general: (1) training ML systems on public data is fair use (2) the output belongs to the operator, just like with a compiler. On the training question specifically, you can find OpenAI’s position, as submitted to the USPTO here: https://www.uspto.gov/sites/default/files/documents/OpenAI_R… We expect that IP and AI will be an interesting policy discussion around the world in the coming years, and we’re eager to participate!”
Support for Copilot
Neil Brown, a legal expert in the digital space, spoke about Copilot from an English law perspective. In his blog, Brown explained GitHub’s passage D4 of Terms of Service. As per this passage, GitHub can copy a user’s content to the database, create backups, show it to other users, parse a search engine, and analyse it on their servers. Brown writes: “The license is broadly worded, and I’m confident that there is scope for argument, but if it turns out that Github does not require a license for its activities then, in respect of the code hosted on Github, I suspect it could make a reasonable case that the mandatory license grant in its terms covers this as against the uploader.”
That said, GitHub also notes in the same passage that this license doesn’t grant permission to sell content, distribute it or use it outside the scope of the service.
Further, Julia Reda, a researcher and former Member of the European Parliament wrote a blog (leaning more towards the EU perspective) titled “GitHub Copilot is not infringing your copyright”. She makes her argument on two grounds:
Data and text mining: Merely scraping off code without the author’s consent, although worthy of criticism, is not a copyright-relevant act that requires permission.
Derivative works and machine-generated code: Putting machine-generated code under the purview of derivative works is ‘dangerous’. Firstly, this assumption suggests that even the smallest piece of excerpts could constitute copyright infringement. Secondly, the very premise of machines being capable enough to produce works is wrong and counterproductive.
The snippets of code appear verbatim when the developer has not provided sufficient context or when there is a universal solution to the problem, the GitHub blog claimed. Also, the GitHub team is building an origin tracker to detect such code duplication instances.
GitHub in its blog clearly mentions that Copilot is to be seen strictly as an AI pair programmer to assist in writing codes. The coders who got access to the tool echoed similar sentiments claiming while Copilot is impressive, it cannot be equated with human programmers. According to blogger Colin Eberhardt, the Copilot has the “wow” factor to make it to the standard toolset of enterprises. However, he thinks it will take some time for the coding assistant to deliver a genuine productivity boost.