MITB Banner

Tech Behind GitHub Copilot: The Coding Assistant From Microsoft & OpenAI

Share

OpenAI and Microsoft have come together to release a technical preview of GitHub Copilot, an AI-based tool that helps programmers write better code. Copilot takes context from the code being worked on and suggests whole lines and functions.

Microsoft acquired GitHub, a popular code-repository service used by many developers and large companies, in 2018.  In 2019, Microsoft invested $1 billion in OpenAI to build artificial general intelligence and jointly develop new Azure AI supercomputing technologies; Microsoft already holds exclusive licenses for OpenAI’s GPT-3 language model.

Copilot is based on OpenAI Codex, an AI system trained on a dataset made up of a sizable chunk of public source code. Copilot works with a broad set of frameworks and languages, and the technical preview is ideal for languages like Python, JavaScript, TypeScript, Go, and Ruby.

GitHub Copilot is an AI pair programmer that works with any new framework or library. The programmer can describe a function in plain English in a comment and the Copilot will convert it to actual code. The tool is already acquainted with specific functions and features. It helps the programmer quickly discover alternative ways for problem-solving, writing tests, and exploring new APIs. The team claimed Copilot is far more advanced than the existing code assistants.

Credit: GitHub

Copilot works best when the code is split into small functions, uses meaningful names for functions parameters, and writes good docstrings and comments in the process. It was recently benchmarked against a set of Python functions with good test coverage in open source repos. The team blanked out function bodies and asked Copilot to fill them in. The tool was found to be correct 43 percent of the time in the first try and 57 percent of the time after ten attempts.

However, according to the team, Copilot is not a substitute for human programmers. The team explained: “GitHub Copilot tries to understand your intent and to generate the best code it can, but the code it suggests may not always work or even make sense. While we are working hard to make GitHub Copilot better, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted, like any other code. As the developer, you are always in charge.”

Copilot does not test the code it suggests. More often than not, the code may fail to compile or run. Since the Copilot holds limited context, even a single source file longer than 100 lines is clipped, and the tool looks at just the immediately preceding context. “You can use the code anywhere, but you do so at your own risk,” the team said.

Stochastic parrot?

Since GitHub Copilot is trained on billions of lines of publicly available codes, it suggests that there might be a direct relationship between the suggested code and the code that is informed by it. Notably, Timnit Gebru and other authors coined the term ‘stochastic parrots’ for AI systems which directly reproduce what they learn during the training period.

However, the team said fitting Copilot into the same category of AI systems would be an oversimplification. The tool is more like a crow that builds novel tools from small blocks, rather than parroting the existing corpus of publicly available code. And as an engineer at GitHub puts it, these systems can feel like “a toddler with a photographic memory”.

GitHub Copilot is a code synthesiser and not a search engine. Meaning, a vast majority of the code it suggests is unique and has not been used before. However, code duplication can’t be entirely ruled out. The team found, 0.1 percent of the time, suggestions may contain verbatim snippets of code from the training set. It generally happens when the developer has not provided sufficient context or when there is a universal solution to the problem. Meanwhile, a few users have pointed out that since the Copilot is trained on public code, it could be considered a form of ‘open-source code laundering’. The team is now working at building an origin tracker to detect code duplication instances.

Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.