On Tuesday, OpenAI announced the release of OpenAI Codex through an API in private beta. Codex is the company’s AI system that translates natural language to code. It can interpret and execute commands given in plain English – making it possible to build a natural language interface for existing apps. Codex is designed to assist and speed up programming work for both professionals and coding amateurs. OpenAI’s demo video, which featured its founders Ilya Sutskever and Greg Brockman, demonstrated how Codex can build simple websites and rudimentary games. For instance, users can type in English commands like ‘create a webpage with a menu on the side and title at the top’ into the software, and Codex translates this into code.
About the Codex
Codex is the backbone of OpenAI’s Copilot, an AI-based tool that helps programmers write better code. Copilot takes context from the code being worked on and suggests whole lines and functions. OpenAI claims that the latest version of Codex is more advanced and can create and complete a chunk of code.
Sign up for your weekly dose of what's up in emerging technology.
“Codex solved 13.2 percent of the evaluation tasks at 300 million parameters and 28.8 percent at 12 billion parameters.”
The training dataset for the software was created from 54 million public software repositories hosted on GitHub, containing 159 GB worth of different Python files under 1 MB. It also surpasses GPT3 with its enhanced understanding of the natural language processing that allows it to produce a working code; i.e., the user can issue commands in English to any software with an API. “OpenAI Codex is a general-purpose programming model, meaning that it can be applied to essentially any programming task,” OpenAI stated in a blog post, “We’ve successfully used it for transpilation, explaining code, and refactoring code.”
The researchers tested their model on HumanEval, an evaluation set to measure functional correctness for synthesizing programs from docstrings. The results proved Codex’s excellence over GPT-3 and GPT-J that solved 0% and 11.4% of the problems, respectively, while Codex earned 28.8%. According to the team, it was a strategy of repeatedly sampling from the model that led to this efficiency of producing working solutions for challenging prompts. This method allowed them to solve 70.2% of the problems with 100 samples per problem.
Their research on Codex’s performance levels further proved the improvement when the model’s size was increased. For instance, Codex solved 13.2 percent of the evaluation tasks at 300 million parameters and 28.8 percent at 12 billion parameters. The paper released by OpenAI brought to light some significant limitations with Codex, including bias and sample inefficiencies. This includes syntactically incorrect recommendations, undefined codes and the software’s ability to invoke functions & attributes that are outside the scope of its codebase. The software also has difficulty synthesizing long or higher-level specifications and can suggest solutions that appear correct superficially but cannot perform the given task.
Codex generates its responses based on its training data from the internet, thereby undertaking human bias in its solutions. The research has found that Codex can be prompted to generate racist, denigratory, and harmful outputs in code comments. The code generated can have a structure that reflects gender, race and class stereotypes. For instance, when given prompts like def gender (x) or def race (x), the software generated gender binaries or a limited mutually exclusive race category.
While Codex can be possibly misused for cybercrime, the models do not materially lower the barrier to entry for malware development at this stage. Overall, these limitations call for robust monitoring and continued research to maintain situational awareness about how models like Codex are being used and misused. OpenAI has made Codex available in private beta on their API, initially for free, to scale up the software.