MITB Banner

PolyCoder vs OpenAi Codex: A comparison between these code generation tools

PolyCoder delivered superior performance in comparison to similarly sized GPT-Neo 2.7B in C, JavaScript, Rust, Scala and TypeScript.

Share

The intersection of code generation tools and large language models (LLMs) is pushing the frontiers of artificial intelligence. Though tech giants have come up with cutting edge models like BERT, Codex, etc, access to such models has been limited. Last year, Carnegie Mellon University researchers developed PolyCoder, a model based on OpenAI’s GPT-2 and trained on 249GB of code across 12 programming languages. Polycode’s core is written in C++. All platform-specific functionality is abstracted into a cross-platform core and implemented natively on each platform, so the same C++ code will compile on each supported platform out of the box. But how does PolyCoder stack up against large language models like Codex and GPT-Neox-20B?

PolyCoder vs Codex: open-source vs proprietary

PolyCoder tested against various language models such as masked language models, encoder-decoder models and left to right auto-regressive models. While some models are pretrained on exclusive GitHub code, others are trained on ‘The Pile’, a large repository consisting of an amalgamation of natural language texts, code from various languages and software documentations. 

parameter comparison of polycoder

Source: arxiv.org

The AI-engines were tested on a set of evaluations based on their extrinsic and intrinsic values. 

Extrinsic evaluation: One of the most common ways to test a model is to try to generate code based on natural language prompts. All models are evaluated on the HumanEval dataset that consists of 164 prompts with description in the form of code, comments, etc. A random sample of 100 examples was taken to evaluate each engine.

performance comparison

Source: arxiv.org

Intrinsic Evaluation: Each language model’s perplexity is compared using an unknown GitHub repository to evaluate its intrinsic performance. The characteristics of the dataset are rendered unknown to prevent data leakage from the training to the test set. To ensure accuracy, a sample of 100 random files are used for each of the 12 coding languages in the evaluation dataset. Perplexity across different tokenisation methods is compared using Pygments to equally normalize the log-likelihood sum of each model.

model performance of polycoder

Source: arxiv.org

When compared to GPT-Neo (2.7B), PolyCoder exhibits fewer Python tokens, but increased code tokens in other programming languages. PolyCoder is a better candidate for transitioning from other languages to Python. Meaning, in the future natural language as well as code from different languages, can be used as a prompt for development. In the intrinsic evaluation, PolyCoder outperformed Codex and all other models in the C language. It delivered superior performance in comparison to similarly-sized GPT-Neo 2.7B in C, JavaScript, Rust, Scala and TypeScript.

Codex

Last year, OpenAI released an improved version of Codex, an AI system that translates natural language to code. Codex powers AI pair programmer GitHub Copilot and is proficient in more than a dozen programming languages. The AI system can interpret simple commands in natural language and execute them on the user’s behalf.

Future of PolyCoder

Deepmind recently launched AlphaCode with 41.4 billion parameters and is among the first AI-based engines that can generate code at a competitive level. AlphaCode demonstrated its capabilities in programming contests hosted by Codeforces scoring top 54.3 percentile against human programmers. However, AlphaCode is not open-sourced. The researchers at Carnegie Mellon University hope their efforts with PolyCoder would encourage the giants to follow suit and act as a catalyst for AI research and the democratisation of LLMs.

The performance of LLMs is generally based on training time and model size. The results showed training on natural language and coding language improves the performance of GPT-Neo over PolyCoder. However, with respect to the C programming language, PolyCoder showed a lower level of perplexity against all models including Codex.

Share
Picture of Kartik Wali

Kartik Wali

A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.