OpenAI Releases Its Most Powerful AI Tool Yet To The Masses

OpenAI Codex is a descendant of GPT-3; its training data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories.

On Tuesday, OpenAI announced the release of OpenAI Codex through an API in private beta. Codex is the company’s AI system that translates natural language to code. It can interpret and execute commands given in plain English – making it possible to build a natural language interface for existing apps. Codex is designed to assist and speed up programming work for both professionals and coding amateurs. OpenAI’s demo video, which featured its founders Ilya Sutskever and Greg Brockman, demonstrated how Codex can build simple websites and rudimentary games. For instance, users can type in English commands like ‘create a webpage with a menu on the side and title at the top’ into the software, and Codex translates this into code.

About the Codex

Codex is the backbone of OpenAI’s Copilot, an AI-based tool that helps programmers write better code. Copilot takes context from the code being worked on and suggests whole lines and functions. OpenAI claims that the latest version of Codex is more advanced and can create and complete a chunk of code.


Sign up for your weekly dose of what's up in emerging technology.

Codex is a descendant of OpenAI’s language generation GPT-3 that can work with written words in a unique way owing to its sizable training data. The data contains both natural language and billions of lines of source code from publicly available sources, including code in public GitHub repositories.  Codex via the API has a high proficiency in Python, and it can work with over a dozen languages, including JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript and Shell. In addition, the software can consider 3x the contextual information GPT3 does with its increased 14 KB memory for Python code (compared to GPT3’s 4 KB).

“Codex solved 13.2 percent of the evaluation tasks at 300 million parameters and 28.8 percent at 12 billion parameters.”  

The training dataset for the software was created from 54 million public software repositories hosted on GitHub, containing 159 GB worth of different Python files under 1 MB. It also surpasses GPT3 with its enhanced understanding of the natural language processing that allows it to produce a working code; i.e., the user can issue commands in English to any software with an API. “OpenAI Codex is a general-purpose programming model, meaning that it can be applied to essentially any programming task,” OpenAI stated in a blog post, “We’ve successfully used it for transpilation, explaining code, and refactoring code.”

The researchers tested their model on HumanEval, an evaluation set to measure functional correctness for synthesizing programs from docstrings. The results proved Codex’s excellence over GPT-3 and GPT-J that solved 0% and 11.4% of the problems, respectively, while Codex earned 28.8%. According to the team, it was a strategy of repeatedly sampling from the model that led to this efficiency of producing working solutions for challenging prompts. This method allowed them to solve 70.2% of the problems with 100 samples per problem.

Their research on Codex’s performance levels further proved the improvement when the model’s size was increased. For instance, Codex solved 13.2 percent of the evaluation tasks at 300 million parameters and 28.8 percent at 12 billion parameters.  The paper released by OpenAI brought to light some significant limitations with Codex, including bias and sample inefficiencies. This includes syntactically incorrect recommendations, undefined codes and the software’s ability to invoke functions & attributes that are outside the scope of its codebase. The software also has difficulty synthesizing long or higher-level specifications and can suggest solutions that appear correct superficially but cannot perform the given task. 

Codex generates its responses based on its training data from the internet, thereby undertaking human bias in its solutions. The research has found that Codex can be prompted to generate racist, denigratory, and harmful outputs in code comments. The code generated can have a structure that reflects gender, race and class stereotypes. For instance, when given prompts like def gender (x) or def race (x), the software generated gender binaries or a limited mutually exclusive race category.

While Codex can be possibly misused for cybercrime, the models do not materially lower the barrier to entry for malware development at this stage. Overall, these limitations call for robust monitoring and continued research to maintain situational awareness about how models like Codex are being used and misused.  OpenAI has made Codex available in private beta on their API, initially for free, to scale up the software. 

Watch the demo here:

More Great AIM Stories

Avi Gopani
Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM