ChatGPT’s Code Interpreter May Make Data Scientists Obsolete

ChatGPT’s new code interpreter plugin is taking over data scientist jobs.
Listen to this story

In March this year, OpenAI announced that they would be adding plugins to ChatGPT, while teasing the launch of a code interpreter and web browser plugin. Last week, the company started rolling out the code interpreter plugin, which has already caused concern among data scientists with just a sneak peek.

The plugin replaces many of the common workflows of a data scientist, including visualisation, trend analysis, and even data transformation. When looking at the code interpreter in tandem with the other advancements in the data science field, the question remains — will data scientists become obsolete?

Data scientist on steroids?

Put simply, the code interpreter is a plugin for ChatGPT that provides a sandboxed and firewalled execution environment for Python code. For security reasons, the interpreter only runs for the duration of the chat session, and is also hosted on ephemeral disk space, meaning the data is cleared after the conversation is closed. 

The interpreter also supports the upload of certain files to the plugin, with outputs from the bot being available to download. In the blog post announcing its launch, OpenAI compared the code interpreter to a “very eager junior programmer working at the speed of your fingerprints”, further stating that it is good at solving mathematical problems, converting files between different formats, and conducting data analysis and visualisation. The interpreter also has access to a variety of Python libraries, including an OCR library and MatPlotLib.

People over the Internet have put ChatGPT to the task, asking it to analyse a variety of datasets from Netflix’s shows to crime data in San Francisco. In these applications, the plugin was able to identify trends, clean the data, and even generate insights. 

In addition to this, the chatbot was also able to generate visualisations for the derived insights, presenting the information in an easy-to-understand format. For example, here is a visualisation of every lighthouse in the United States, generated from a simple CSV file of lighthouse locations. 

Instead of wrestling with spreadsheets and complex visualisation software, anyone can simply prompt the code interpreter to give them the result they want. 

This set of roles and responsibilities closely describes the job description of an average data scientist, except that ChatGPT does it way faster. So, what is the value proposition for a data scientist? For many, it might just be about trusting the data.

Not human after all

A data scientists’ responsibilities go beyond wrangling the data and visualising it. An expert data scientist acknowledges the importance of storytelling through data and the value of finding hidden nuggets of insights through the human touch. While ChatGPT’s code interpreter is not capable of doing so, due to its lack of logical thinking, the plugin comes with another set of problems: hallucinations

While the bot may be able to fulfil some of the roles of a data scientist, it is still based on an LLM, which is prone to hallucinations. Users on Hacker News had this to say about some of the visualisations created by the chatbot. 

“Current chatbot AIs have impressive capabilities but are also prone to getting important details wrong. There are also plenty of “obvious” glitches in the graphics simulations, but those concern me less – precisely because they’re obvious.”

It seems that hallucinations follow ChatGPT wherever it goes, and the code interpreter is no different. However, it does seem that these hallucinations are largely restricted to the visualisations created by the code interpreter. In addition to this, there is also the problem of data contamination in ChatGPT’s dataset. 

Common visualisations, like plotting a graph from a CSV, are relatively easy for the LLM to carry out. This is likely because these kinds of projects are well-documented all over the Internet, making it more likely for ChatGPT to know about them. However, an actual data scientist in a big organisation is likely to face visualisation problems that go beyond simple graphs or map plots, which the code interpreter cannot handle reliably.

Horace He on Twitter showed an example of this contamination. Picking up the example of Codeforces problems, he found that GPT-4 was able to solve 10/10 of the problems posted pre-2021, but completely failed at solving any of the problems posted post this date.

While these examples don’t show the whole picture, it is clear that ChatGPT’s code interpreter is not going to replace a data scientist any time soon. However, it is quite close to being a ‘personal data analyst’ of sorts for those who are not familiar with data science as a field. It can also grow to be a reliable pair of data scientists to work alongside a human. 

Download our Mobile App

Anirudh VK
I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox