Published on September 3, 2024
In Deep Tech

Why AI Can’t Get Software Testing Right

It’s already a danger when you write the implementation first; AI is only going to make it worse.

By Sagar Sharma

Writing unit tests was already a headache for developers, and AI is making it worse. A recent study has unveiled a critical weakness in LLMs: their inability to create accurate unit tests. While ChatGPT and Copilot demonstrated impressive capabilities in generating correct code for simple algorithms (success rates ranging from 63% to 89%), their performance dropped significantly when tasked with producing unit tests which are used to evaluate production code. ChatGPT’s test correctness fell to a mere 38% for Java and 29% for Python, with Copilot showing only slightly better results at 50% and 39%, respectively. According to a study published by GitLab in 2023, automated test generation is one of the top use cases for AI in software development, with 41% of respondents currently using it. However, this recent study is now questioning the quality of those tests. A fullstack developer named Randy on Daily.dev forum mentioned that he had tried AI for both writing

Subscribe or log in to Continue Reading

Uncompromising innovation. Timeless influence. Your support powers the future of independent tech journalism.

Already have an account? Sign In.

📣 Want to advertise in AIM? Book here

Sagar Sharma

A software engineer who loves to experiment with new-gen AI. He also happens to love testing hardware and sometimes they crash. While reviving his crashed system, you can find him reading literature, manga, or watering plants.