ChatGPT Privacy Threat is Real and We are Late

Working with neural networks is such that it is almost impossible to set standards on how AI systems should be made and tested
Listen to this story

Tech giants are faced with mounting consequences for disregarding data protection laws, with hefty fines being levied on many of them by the governments. The latest to get a rap on the knuckles is Meta that was penalised a whopping $410 million by the Irish Data Protection Commission for failing to adhere to EU’s General Data Protection Regulation.

Adding to the privacy woes is ChatGPT. While OpenAI’s widely-acclaimed conversational chatbot has garnered a lot of publicity for its wide usage across different domains, little has been spoken about whether it ensures privacy. And this is particularly important considering that for a consumer-facing product like ChatGPT to be better, it needs to keep collecting user data to train its model. It is a never-ending cycle where new data will train the model to provide better AI, in turn to provide new data again. ChatGPT’s methods of data collection have left companies visibly spooked. 

Amazon’s warning 

A case for this emerged in the warning issued by Amazon to its employees against sharing company information with ChatGPT. Business Insider’s examination of Amazon’s internal communication revealed that a company lawyer warned employees against sharing confidential information or code with the AI chatbot. This precautionary measure was taken after ChatGPT generated responses that replicated Amazon’s internal data. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

As a recent report showed, the chatbot was able to correctly answer interview questions from Amazon. It was able to provide answers to some exclusive questions known only to the recruiting team of the company. It won’t be long before ChatGPT will also be able to reproduce the technical questions (or find a pattern among them) asked generally at individual organisations. In such a case, it would be necessary for all organisations to issue guidelines for its usage. Therefore, policies pertaining to AI for fair use have become even more critical. And, not just enterprises but also for regulators to intervene and establish standards for building safe AI systems. 

AI Act in the Making

The EU AI Act (AIA) – the first of its kind – assigns AI applications into three risk groups. The first group includes AI systems that pose an ‘unacceptable risk’ and are therefore banned, such as social scoring systems run by the government. The second group includes ‘high-risk’ AI systems, such as CV-scanning tools for job applicants, which are subject to specific legal requirements. Lastly, the third group includes AI systems that are neither high-risk nor banned and are largely unregulated.

Download our Mobile App

At the moment, the categorisation and enforcement of such laws seem vague and done for no end cause. Until now, be it GDPR, AIA, or India’s DPDP (Digital Personal Data Protection) Bill, policies have primarily focused on protecting the interests of the consumer. However, given the harm that AI systems are causing to businesses, it is imperative that regulatory bodies develop standards that dictate how AI systems should be created. 

Setting AI Standards – an impossible task? 

Hadrien Pouget, in writing for Lawfare, explains that currently there is a lack of knowledge on how to create state-of-the-art AI systems that consistently adhere to established principles. Furthermore, there is also an absence of methods to test if AI systems are adhering to these principles. Although simpler AI techniques may be more manageable, the recent advancements in AI, particularly with neural networks, remain largely mysterious.

“Their performance improves by the day, but they continue to behave in unpredictable ways and resist attempts at remedy,” Pouget writes. In fact, along the lines set up earlier, Pouget also stresses that the reason setting standards for reliability of neural networks is difficult is because these models are essentially guided by data and they can learn from data in unintuitive and unexpected ways. 

Therefore, working with neural networks is such that it is almost impossible to set standards on how AI systems should be made and tested. However, on the other hand, it is absolutely unavoidable for chatbots and other AI systems to collect data to be able to produce better outputs. It is the same argument that we have visited numerous times before in the case of big tech companies. While the means for collecting data were unlawful, it was deemed “inevitable” to ensure personalised user experience. 

The issue of the kind of input data used to train AI has emerged in many other contexts as well. For instance, Clearview AI collected people’s images from the web and used them to train its facial surveillance AI without consent. Its database comprises approximately 20 billion images. Despite facing numerous lawsuits, fines, and cease-and-desist orders for violating people’s privacy, Clearview has managed to evade paying some fines and has refused to delete data despite orders from regulators.

This is just one example of how unclear regulations can impact enterprise and consumers alike at an unprecedented scale. 

There was also the case of Matthew Butterick who filed a lawsuit against GitHub Copilot for violating open-source licences. Butterick claimed that GitHub makes “suggestions” for code based on someone else’s intellectual property, while failing to credit or compensate them for the same. Microsoft, on the other hand, puts it on the end-user to IP scan the code before usage. Perhaps the lack of AI standards is what led Microsoft and OpenAI to appeal to court to throw out the AI copyright lawsuit against them.  

We are concerned about the data that big tech companies are currently collecting for advertising and user experience purposes. However, with the advancement of AI and chatbots, the volume of data collected is expected to dramatically increase, raising even more concerns about privacy and the proper use of personal information. 

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ayush Jain
Ayush is interested in knowing how technology shapes and defines our culture, and our understanding of the world. He believes in exploring reality at the intersections of technology and art, science, and politics.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox