MITB Banner

Why Indian IT Prefers CodeNet over GitHub Copilot

Since the IBM CodeNet is open source, there might be a possibility that the Indian IT companies are working on their own NLP models with the help of CodeNet datasets to achieve the dream of NLC—a dream far fetched, but not impossible. 
Share
Listen to this story

Microsoft’s Github Copilot is gaining traction day by day, by writing 46% of the codes for developers—a significant increase from 27% in June 2022. Additionally, the platform is now available for business as well, with the launch of ‘Copilot for business’. However, in India, Github might face competition from a legacy tech company IBM, when it comes to IT automation. As per AIM’s recent findings, one of the largest IT giants in India has been collaborating with ‘IBM CodeNet’, possibly for developing its own AI model.

IBM CodeNet is essentially a large-scale standard dataset with approximately 14 million code samples downloaded from two online judge websites, AIZU Online Judge and AtCoder. DeepMind’s Alphacode is also trained on IBM CodeNet. 

It is believed that IBM CodeNet can be of immense help in achieving the dream of Natural Language Coding (NLC)—something which Microsoft’s Github Copilot is aiming for. Following up on CodeNet, IBM also launched ‘Project Wisdom’, which—according to IBM—aims for a “computer to program a computer”. 

In addition, the connection of Indian IT’s interest with IBM might be the solution the company offers for Indian IT, i.e, automation. Indian IT has long been claiming that it is struggling with the unskilled workforce in India—a phenomenon that has been highlighted sporadically. This is further exemplified when companies like Infosys fire more than 600 freshers following their failure in the company’s internal assessment. 

Giving the example of open-source software developer communities like Ansible, IBM claims that IT automation platforms like Red Hat Ansible Automation Platform can help such companies in the realm of automation. 

According to IBM, the Wisdom Project automatically generates code on the Red Hat Ansible platform by providing a natural language interface to the developers. The tool allows developers to input commands in simple English sentences, such as, “Deploy Web Application Stack” or “Install Nodejs dependencies”.

Once the command is entered, Wisdom parses the sentence and creates the requested automation workflow, which is delivered as an ‘Ansible Playbook’. The developer can either accept the playbook as is or customise it as per their needs.

Wisdom better than Copilot? 

IBM’s Wisdom model is claimed to have been trained on 350 million parameters, which is minuscule compared to the 12 billion parameters on which Github Copilot is trained on. However, when it comes to accuracy, IBM claims that the model’s accuracy far exceeds its Github counterpart. 

Source: IBM

IBM’s alternative is also important for Indian IT as there is widespread fear about corporations using Copilot that has allegedly been trained on IP codes. When inquired, Github told AIM that their internal research shows that about 1% of the time, a suggestion may contain some code snippets longer than ~150 characters that matches the training set.

“To prevent that from happening, we built a filter to help detect and suppress GitHub Copilot suggestions which contain code that matches public code on GitHub,” says Shuyin Zhao, Sr. Director, Product Management, GitHub. 

He says that with the filter enabled, GitHub Copilot checks code suggestions with its surrounding code for matches or near matches—ignoring whitespace—against public code on GitHub of about 150 characters. If there is a match, the suggestion will not be shown to the user.

However, the mere possibility of having IP codes in the dataset of GitHub Copilot makes Indian IT uneasy. Additionally, since IBM has a reputation for possessing abundant research funds, the choice to collaborate with the company to move forward with NLC is a better option for Indian IT. 

Indian IT and Automation

We have previously talked about how Indian IT companies have exhibited optimism towards LLM models like Github Copilot and emphasised that this approach might result in loss of employment opportunities. Now, with the news of a major IT company collaborating with IBM CodeNet, this possibility is higher than ever. 

Since IBM CodeNet is open source, there might be a possibility that the Indian IT companies are working on their own NLP models with the help of CodeNet datasets to achieve the dream of NLC—a dream far fetched but not impossible

In the event that this dream does come true, it is worth considering what will become of the highly circulating code of TCS CHRO Milind Lakkad—Would AI not aid in job losses then? 

PS: The story was written using a keyboard.
Share
Picture of Lokesh Choudhary

Lokesh Choudhary

Tech-savvy storyteller with a knack for uncovering AI's hidden gems and dodging its potential pitfalls. 'Navigating the world of tech', one story at a time. You can reach me at: lokesh.choudhary@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India