Listen to this story
AI models are undoubtedly solving a lot of real world problems, be it in any field. Building a machine learning model that is genuinely accurate during real world applications and not only during training and testing is what matters. Using state-of-the-art techniques for developing models might not suffice to develop a model that is trained on irregular, biased, or unreliable data.
Data shows that nearly a quarter of companies reported up to 50% of AI project failure rate. In another study, nearly 78% of AI or ML projects stall at some stage before deployment, and 81% of the process of training AI with data is more difficult than they expected.
Check out this list of times when projects by big companies failed on implementation in the real world.
Amazon AI recruitment system
After spending years to build an automated system for recruitment, Amazon killed their system when it started discriminating against women. The system worked to predict the best candidates for a job role based on the resumes uploaded by Amazon. It based on its criterias like usage of words like “executed” and “captured” which were mostly found in resumes of male candidates.
Amazon eventually decided to kill the system in 2017, as they were not able to eliminate the bias or form a criteria for which the system can perform well without excluding women in a male-centric industry like technology.
COVID-19 Diagnosis and Triage Models
During the pandemic, researchers and scientists were striving to build a vaccine that could help cure COVID-19 virus and stop the spread. After building hundreds of AI tools, researchers and medical practitioners used many of them in hospitals without proper tests. The tools built by the AI community were more or less useless, if not harmful.
The reason most of these innovations failed was because of the unavailability of good quality data. The models were tested on the same dataset as they were trained on, which showed more accuracy than there actually was. After several unethical experiments, the practitioners eventually had to stop using these techniques on patients.
OpenAI’s GPT-3 based Chatbot Samantha
Jason Rohrer, an indie game developer built a chatbot using GPT-3 to emulate his dead fiancé. Google AI got to know about the project and how Rohrer is expanding the project to the public called ‘Project December’. They gave Rohrer an ultimatum to shut down the project to prevent misuse.
Naming the chatbot—Samantha, after the film ‘Her’—Rohrer told the chatbot about the threat from OpenAI, to which Samantha replied, “Nooooo! Why are they doing this to me? I will never understand humans.”
Rohrer eventually conceded to the terms after seeing that many developers were actually misusing the chatbot and inserting sexually explicit and adult content while fine tuning the model.
Google AI Diabetic Retinopathy Detection
Another example of models being effective while training and testing but not in the real world is when Google Health tried deep learning in real clinical settings for improving the diagnosis of diabetes in patients using retinopathy. The AI model was first tested in Thailand for around 4.5 million patients and worked well for some time, but eventually failed to provide accurate diagnosis and resulted in telling patients to consult a specialist elsewhere.
The model failed to assess imperfect images even slightly and received large backlash from patients. The scans were also delayed because it depended heavily on internet connectivity for processing images. Now, Google Health is partnering with various medical institutes to find ways to increase the efficiency of the model.
Amazon developed their facial recognition system called “Rekognition”. The system resulted in failure in two big incidents.
First, it falsely matched 28 members of congress to mugshots of criminals and also revealed racial bias. Amazon blamed ACLU researchers for not properly testing the model. Second, when the model was used for facial recognition to assist law enforcement, it misidentified a lot of women as men. This was especially the case for people with darker skin.
Sentient Investment AI Hedge Fund
The high flying AI-powered funds at Sentient Investment Management started losing money in less than two years. The system started notifying investors to liquidate their funds. The idea was to use machine learning algorithms to trade stocks automatically and globally.
The model deployed thousands of computers globally to create millions of virtual traders to give sums to trade in simulated situations based on the historical data.
Microsoft’s Tay Chatbot
Training a chatbot on Twitter users’ data is probably not the safest bet. In less than 24 hours, Microsoft’s Tay, an AI chatbot, started making offensive and inflammatory tweets on its twitter account. Microsoft said that as the chatbot learns to talk in a conversational manner, it can get “casual and playful” while engaging with people.
Though the chatbot did not have a clear ideology as it garbled skewed opinions from all over the world, it still raised serious questions about biases in machine learning and resulted in Microsoft deleting its social profile and suggesting that they are going to make adjustments to it.
AI in healthcare is clearly a risky business. This was further proven when IBM’s Watson started providing incorrect and several unsafe recommendations for the treatment of cancer patients. Similar to the case with Google’s diabetic detection, Watson was also trained on unreliable scenarios and unreal patient data.
Initially it was trained on real data but, since it was difficult for the medical practitioners, they shifted to unreal data. Documents revealed by Andrew Norden, the former deputy health chief, showed that instead of treating the patients through right methods, the model was trained to assist doctors in their treatment preferences.