Earlier this month, San Francisco-based AI research lab OpenAI made its AI programmer technology Codex available through an application programming interface (or API), as announced earlier through its blog. Simultaneously, Co-founder Wojciech Zaremba, CTO Greg Brockman, and Chief Scientist Olya Sutskever took the opportunity to present the capabilities of the AI programmer technology to the world.
In simple terms, Codex is an AI system that translates natural language to code. It powers GitHub Copilot — a coding assistant from Microsoft and OpenAI, which created a storm on social media ever since its announcement. GitHub Copilot takes context from the code that a developer works on and suggests complete lines and functions of code.
OpenAI, however, claims that the latest version of Codex is way more advanced. In fact, it can create and complete a chunk of code. But, what led to the foundation of Codex?
Last year, OpenAI’s much-talked-about, GPT-3, was made available through commercial API to allow developers and researchers across the globe to explore the possibilities and capabilities of the language model. Interestingly, for a model that was not meant for coding at all, GPT-3’s programming applications caught the maximum attention.
In tandem with this, Codex has been launched as an advanced version of GPT-3, fine-tuned for programming tasks.
Deep Learning Model with Limitations
It is a deep learning model that takes in a natural language prompt as input to generate code. The goal with Codex is to automate the mundane bits of programming:
- Launching web servers
- Rendering web pages
- Sending emails
During the final demo, viewers were asked to leave their email addresses in a web form. Codex was then used to create a Python script to check the current Bitcoin price and email it to all the viewers sharing email addresses — 1472 recipients received the email.
Finally, the demonstrators used an iPad with Microsoft Word to use a custom Codex plugin consisting of keys to trigger speech recognition. The output was fed into Codex, translated into code, and then ran. Codex was made to perform mundane tasks such as making every fifth line bold and removing initial spaces. Thus, it proved to be beneficial for creating code on the fly.
Codex seems to show some of GPT-3’s zero-shot learning capabilities (the capability of performing a task it was not trained for). However, the demo does not provide a complete picture of its deep learning capabilities and limitations. Machine learning models are usually more accurate when they are made to perform specific tasks. As the problem-solving arena increases, the performance decreases.
For instance, during the demo, when Brockman asked the model to print ‘Hello world’ five times, it printed the message five times but next to each other, unlike displaying it one after another. Therefore, Brockman had to rephrase his command to make the model print the message five times in five different lines. The model has many such gaps that the Co-founder and CTO missed touching upon (deliberately?) during the demo. This further raised criticisms from the tech community:
GPT-3 had zero per cent accuracy on coding tasks, while the previous Codex model was 27 per cent accurate. The next generation Codex model has a 37 per cent accuracy on coding tasks, as per OpenAI.
Some questions remain unanswered — how Codex will work with arbitrary APIs and how it will perform outside the world of demons by the team. While the capabilities of Codex seem to be immense, OpenAI is still inspecting its powers. As a result, it is still very early to comment on whether the implementation and maximisation of Codex affect the software engineering job market. However, the first look of Codex seems to be promising.