Why Indian IT Prefers CodeNet over GitHub Copilot

Since the IBM CodeNet is open source, there might be a possibility that the Indian IT companies are working on their own NLP models with the help of CodeNet datasets to achieve the dream of NLC—a dream far fetched, but not impossible. 
Listen to this story

Microsoft’s Github Copilot is gaining traction day by day, by writing 46% of the codes for developers—a significant increase from 27% in June 2022. Additionally, the platform is now available for business as well, with the launch of ‘Copilot for business’. However, in India, Github might face competition from a legacy tech company IBM, when it comes to IT automation. As per AIM’s recent findings, one of the largest IT giants in India has been collaborating with ‘IBM CodeNet’, possibly for developing its own AI model.

IBM CodeNet is essentially a large-scale standard dataset with approximately 14 million code samples downloaded from two online judge websites, AIZU Online Judge and AtCoder. DeepMind’s Alphacode is also trained on IBM CodeNet. 

It is believed that IBM CodeNet can be of immense help in achieving the dream of Natural Language Coding (NLC)—something which Microsoft’s Github Copilot is aiming for. Following up on CodeNet, IBM also launched ‘Project Wisdom’, which—according to IBM—aims for a “computer to program a computer”. 

In addition, the connection of Indian IT’s interest with IBM might be the solution the company offers for Indian IT, i.e, automation. Indian IT has long been claiming that it is struggling with the unskilled workforce in India—a phenomenon that has been highlighted sporadically. This is further exemplified when companies like Infosys fire more than 600 freshers following their failure in the company’s internal assessment. 

Giving the example of open-source software developer communities like Ansible, IBM claims that IT automation platforms like Red Hat Ansible Automation Platform can help such companies in the realm of automation. 

According to IBM, the Wisdom Project automatically generates code on the Red Hat Ansible platform by providing a natural language interface to the developers. The tool allows developers to input commands in simple English sentences, such as, “Deploy Web Application Stack” or “Install Nodejs dependencies”.

Once the command is entered, Wisdom parses the sentence and creates the requested automation workflow, which is delivered as an ‘Ansible Playbook’. The developer can either accept the playbook as is or customise it as per their needs.

Wisdom better than Copilot? 

IBM’s Wisdom model is claimed to have been trained on 350 million parameters, which is minuscule compared to the 12 billion parameters on which Github Copilot is trained on. However, when it comes to accuracy, IBM claims that the model’s accuracy far exceeds its Github counterpart. 

Source: IBM

IBM’s alternative is also important for Indian IT as there is widespread fear about corporations using Copilot that has allegedly been trained on IP codes. When inquired, Github told AIM that their internal research shows that about 1% of the time, a suggestion may contain some code snippets longer than ~150 characters that matches the training set.

“To prevent that from happening, we built a filter to help detect and suppress GitHub Copilot suggestions which contain code that matches public code on GitHub,” says Shuyin Zhao, Sr. Director, Product Management, GitHub. 

He says that with the filter enabled, GitHub Copilot checks code suggestions with its surrounding code for matches or near matches—ignoring whitespace—against public code on GitHub of about 150 characters. If there is a match, the suggestion will not be shown to the user.

However, the mere possibility of having IP codes in the dataset of GitHub Copilot makes Indian IT uneasy. Additionally, since IBM has a reputation for possessing abundant research funds, the choice to collaborate with the company to move forward with NLC is a better option for Indian IT. 

Indian IT and Automation

We have previously talked about how Indian IT companies have exhibited optimism towards LLM models like Github Copilot and emphasised that this approach might result in loss of employment opportunities. Now, with the news of a major IT company collaborating with IBM CodeNet, this possibility is higher than ever. 

Since IBM CodeNet is open source, there might be a possibility that the Indian IT companies are working on their own NLP models with the help of CodeNet datasets to achieve the dream of NLC—a dream far fetched but not impossible

In the event that this dream does come true, it is worth considering what will become of the highly circulating code of TCS CHRO Milind Lakkad—Would AI not aid in job losses then? 

Download our Mobile App

Lokesh Choudhary
Tech-savvy storyteller with a knack for uncovering AI's hidden gems and dodging its potential pitfalls. 'Navigating the world of tech', one story at a time. You can reach me at:

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can Apple Save Meta?

The iPhone kicked off the smartphone revolution and saved countless companies. Could the Pro Reality headset do the same for Meta?