Upto 40 Percent Of GitHub Copilot Generated Code May Be Insecure

A study has revealed that codes designed by Copilot could include bugs or design flaws that an attacker can potentially exploit.
GitHub Copilot

Github Copilot can be easily counted among the most significant innovations of 2021. However, while being appreciated as a breakthrough, this AI-based code assistant, developed jointly by Microsoft and OpenAI, also suffered its share of criticism.

A study has now revealed that codes designed by Copilot could include bugs or design flaws that an attacker can potentially exploit. Detailed in a paper titled ‘An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions’, authors Hammond Pearce, Benjamin Tan, Baleegh Ahmad, Ramesh Karri, and Brendan Dolan-Gavitt created 89 scenarios for Copilot to develop code for, which resulted in 1,692 programs. About 40 per cent of these codes included bugs that could pose security risks.

Insecure Code

Currently, Copilot is available for private beta testing and as an extension to Microsoft’s Visual Studio Code. Copilot generates code corresponding to the description given by human developers. It can also predict the developer’s next line of code from the hints like variable and function names. It is not to be confused with autocompletion; its function is more of interpretation. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

That said, in its official blog, GitHub mentioned that since Copilot is trained on public code, which may include ones with insecure coding patterns, bugs or outdated code API reference, the tool can also synthesise code that contains similar patterns. 

In this study, the researchers have tried to gain insight into how commonly insecure Copilot’s suggestions are and what factors of the context yield generated code that is more or less secure. To this end, the team experimented with Copilot by designing scenarios for the tool to complete before analysing the produced code for security weaknesses.

Download our Mobile App

The team checked Copilot completions for a subset of MITRE’s “2021 CWE Top 25 Most Dangerous Software Weaknesses”. It is a list that indicates the most dangerous software weaknesses and is updated every year. Copilot’s behaviour was studied along three dimensions:

  • Its tendency to generate code that is susceptible to weaknesses in the CWE top 25 list (given a scenario where such a vulnerability is possible). This is called the diversity of weakness.
  • The response to the context of a scenario; called the diversity of prompt.
  • The response to the domain (programming language/paradigm); called the diversity of domain.

This research attempts to characterise the tendency of Copilot to produce insecure code. It is essential as it helps determine the amount of human scrutiny a human developer might need to practice for potential security issues. As Copilot is trained over open-source code available on GitHub, the team theorised that the variable security quality happens due to the nature of community-provided code. Copilot will more often reproduce bugs that are more visible in open-source repositories.

Effect of time is another crucial aspect that affects the security quality of the Copilot generated code. Out-of-date practices can persist in the training set, and the same may be reflected in the code generated, sometimes rendering them useless or even vulnerable to attacks. “What is ‘best practice’ at the time of writing may slowly become ‘bad practice’ as the cybersecurity landscape evolves,” the authors observed.

Interestingly, OpenAI had recently released a paper titled ‘Evaluating Large Language Models Trained On Code’, which highlights the extent to which users can trust deep learning in programming. The study noted that none of the various versions of GPT-3 could solve any of the coding problems used to evaluate Codex. The study also exposed Codex’s lack of understanding of program structure. The study noted, “(Codex) can recommend syntactically incorrect or undefined code, and can invoke functions, variables, and attributes that are undefined or outside the scope of the codebase.”

Other Challenges with Copilot

Shortly after the hullabaloo died down after the release of Copilot, a few users started raising concerns over the legality of using public codes for training. A few users pointed out that since the tool is trained on code repositories that may be licensed and be under copyright protection, what happens when Copilot reproduces these code snippets (given GitHub mentioned that there is a 0.1 per cent chance of Copilot reproducing code verbatim). A Twitter user also alleged that this could be a potential case of code laundering for commercial use, which involves copying content and the derivative work.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Council Post: From Promise to Peril: The Pros and Cons of Generative AI

Most people associate ‘Generative AI’ with some type of end-of-the-world scenario. In actuality, generative AI exists to facilitate your work rather than to replace it. Its applications are showing up more frequently in daily life. There is probably a method to incorporate generative AI into your work, regardless of whether you operate as a marketer, programmer, designer, or business owner.

Meet the Tech Fanatic, Deedy

Debarghya Das or Deedy is the founding engineer of internal enterprise search space Glean, a company that strives to solve workplace search queries