Github Copilot can be easily counted among the most significant innovations of 2021. However, while being appreciated as a breakthrough, this AI-based code assistant, developed jointly by Microsoft and OpenAI, also suffered its share of criticism.
A study has now revealed that codes designed by Copilot could include bugs or design flaws that an attacker can potentially exploit. Detailed in a paper titled ‘An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions’, authors Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt created 89 scenarios for Copilot to develop code for, which resulted in 1,692 programs. About 40 per cent of these codes included bugs that could pose security risks.
Insecure Code
Currently, Copilot is available for private beta testing and as an extension to Microsoft’s Visual Studio Code. Copilot generates code corresponding to the description given by human developers. It can also predict the developer’s next line of code from the hints like variable and function names. It is not to be confused with autocompletion; its function is more of interpretation.
That said, in its official blog, GitHub mentioned that since Copilot is trained on public code, which may include ones with insecure coding patterns, bugs or outdated code API reference, the tool can also synthesise code that contains similar patterns.
In this study, the researchers have tried to gain insight into how commonly insecure Copilot’s suggestions are and what factors of the context yield generated code that is more or less secure. To this end, the team experimented with Copilot by designing scenarios for the tool to complete before analysing the produced code for security weaknesses.
The team checked Copilot completions for a subset of MITRE’s “2021 CWE Top 25 Most Dangerous Software Weaknesses”. It is a list that indicates the most dangerous software weaknesses and is updated every year. Copilot’s behaviour was studied along three dimensions:
- Its tendency to generate code that is susceptible to weaknesses in the CWE top 25 list (given a scenario where such a vulnerability is possible). This is called the diversity of weakness.
- The response to the context of a scenario; called the diversity of prompt.
- The response to the domain (programming language/paradigm); called the diversity of domain.
This research attempts to characterise the tendency of Copilot to produce insecure code. It is essential as it helps determine the amount of human scrutiny a human developer might need to practice for potential security issues. As Copilot is trained over open-source code available on GitHub, the team theorised that the variable security quality happens due to the nature of community-provided code. Copilot will more often reproduce bugs that are more visible in open-source repositories.
Effect of time is another crucial aspect that affects the security quality of the Copilot generated code. Out-of-date practices can persist in the training set, and the same may be reflected in the code generated, sometimes rendering them useless or even vulnerable to attacks. “What is ‘best practice’ at the time of writing may slowly become ‘bad practice’ as the cybersecurity landscape evolves,” the authors observed.
Interestingly, OpenAI had recently released a paper titled ‘Evaluating Large Language Models Trained On Code’, which highlights the extent to which users can trust deep learning in programming. The study noted that none of the various versions of GPT-3 could solve any of the coding problems used to evaluate Codex. The study also exposed Codex’s lack of understanding of program structure. The study noted, “(Codex) can recommend syntactically incorrect or undefined code, and can invoke functions, variables, and attributes that are undefined or outside the scope of the codebase.”
Other Challenges with Copilot
Shortly after the hullabaloo died down after the release of Copilot, a few users started raising concerns over the legality of using public codes for training. A few users pointed out that since the tool is trained on code repositories that may be licensed and be under copyright protection, what happens when Copilot reproduces these code snippets (given GitHub mentioned that there is a 0.1 per cent chance of Copilot reproducing code verbatim). A Twitter user also alleged that this could be a potential case of code laundering for commercial use, which involves copying content and the derivative work.
github copilot has, by their own admission, been trained on mountains of gpl code, so i'm unclear on how it's not a form of laundering open source code into commercial works. the handwave of "it usually doesn't reproduce exact chunks" is not very satisfying pic.twitter.com/IzqtK2kGGo
— eevee (@eevee) June 30, 2021