Last updated December 25, 2023
In AI Trends & Future

Stack Overflow Snatches the Spot from ChatGPT

The OpenAI chatbot answered over 50% of the software engineering questions from Stack Overflow inaccurately

Share

Published on August 17, 2023

by Tasmia Ansari

Listen to this story

Since 2008, if any programmers had a question, their first destination was Stack Overflow (SO). Until OpenAI unleashed ChatGPT.

ChatGPT comes in handy for information needs. However, new research states that the high-profile chatbot might not be the optimal solution for software engineering prompts. In the context of programming questions akin to those on SO, OpenAI’s ChatGPT is wrong more than half the time.

Since no data was showing just how much assistance ChatGPT can provide in answering those types of prompts, Purdue University meticulously examined the dilemma. To figure out its efficacy, researchers Samia Kabir and their team meticulously presented 517 questions similar to those found on SO to ChatGPT. The team examined the accuracy and quality of those responses for the study.

The findings tell a rather telling story. Out of the total responses, a significant 52%—amounting to 259 answers—were incorrect, while a comparatively 48% proved accurate. Moreover, a considerable 77% of the answers were verbose. This staggering amount of responses although seemed well-articulated, also raised concerns about the potential impact on clarity and efficiency. Paradoxically, the AI model’s inaccuracy is overshadowed by its eloquence suggested by the research paper’s observations.

The power of StackOverflow is peer review. Some people will go out of their way to make sure the information shown on the posts are correct.

ChatGPT slaps in language model engine that makes paragraphs look trustworthy, but there is no guarantee that the info had been vetted.
— PR (@frostshoxx) April 3, 2023

A user also stated, in their experience, when prompted in well-known subjects, ChatGPT mostly produces somewhat-to-very-wrong answers. “Whether right, inaccurate, or completely wrong, it produces equally confident language. It is therefore extremely likely to produce confidently wrong answers in subjects that I do not know. I cannot tell whether the text it is spewing is approximately correct, dangerously wrong, or merely somewhat inaccurate. Therefore it is clearly worse than useless.”

In the research, titled “Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions,” researchers further uncovered a bunch of insights and concerning findings.

All In For Semantics

The authors also found OpenAI’s ChatGPT is more likely to make conceptual errors than factual ones. “Many answers are incorrect due to ChatGPT’s incapability to understand the underlying context of the question being asked,” the paper found.

Earlier this month, SO decided to switch to semantic search due to the constant rise of traffic on the page. In the announcement blog, the company stated, “Semantic search and LLMs go together like cookies and milk”. In layman’s terms, semantic search understands the meaning and intent behind queries in a way a human would. As a result, it delivers precise and contextually relevant search results.

In the announcement blog, SO further stated, its ‘ethos is simple: accuracy and attribution’. While GPT models out there are generating results from sources unknown, The company has taken charge to attribute questions and answers used in their Retrieval Augmented Generation (RAG) LLM summaries.

Upgraded with AI

The implications of the research extend beyond ChatGPT’s performance.

Since the release of ChatGPT word about the AI chatbot killing Stack Overflow has gotten around. The news was based on the decline of users on the developers’ Q&A platform. As per the Purdue study, the observed decline of conventional platforms like SO indicates that ChatGPT’s popularity is reshaping the online programming assistance landscape.

This shift is underscored by the results of the 2023 Stack Overflow annual Developer Survey, which has insights from 90,000 programmers. The survey highlights that an overwhelming 77% of developers hold a positive view of AI tools. However, when it comes to accuracy, only 42% trust these tools. In an attempt to turn things around a fortnight ago, the New York-based company introduced an umbrella of AI tools under the name of OverflowAI.

In a strategic response SO also unveiled the GenAI Stack Exchange, a dedicated community platform for the exchange of insights on AI tools. These recent moves reflect a conscious effort by SO to adapt to the shifting preferences of developers seeking AI knowledge. Furthermore, SO has introduced the Stack Overflow Natural Language Processing (NLP) Collective with a feature named Discussions, for engaging in nuanced AI debates surrounding technical approaches.

With the release of these recent AI features, the company is taking extra efforts to give tough competition to the internet’s current favourite tool ChatGPT. Even with a slight usage decline, the Purdue study concludes that SO has managed to maintain an upper hand in the engineering department.

Access all our open Survey & Awards Nomination forms in one place