How Enterprises Can Save Their Data from Gen AI Black Hole

Changing data norms should be the first step for enterprises entering AI

Share

Published on April 24, 2023

by Anirudh VK

Recently, Samsung employees came under fire for leaking confidential company information through an unorthodox method: ChatGPT. Since then, companies handling sensitive information have sent out memos cracking down on employees’ usage of ChatGPT, but OpenAI’s chatbot chaos is just about beginning.

Generative AI has a lot of potential to disrupt the enterprise space, so it’s no wonder that this technology is now beginning to seep into companies’ tech stacks. However, as with any emerging technology, it comes with its own share of pitfalls, the biggest among them being data security. For enterprises that still wish to capitalise on the positives of generative AI, such as auto-coding platforms (Amazon CodeWhisper, Replit Ghostwriter, GitHub Copilot, etc), realistic text-to-speech algorithms (ElevenLabs Prime AI voice, OpenAI Whisper etc.), and image generation (DALL.E, Midjourney, Stable Diffusion, etc), there are a few steps they can take to ensure the safety of their data

Safeguarding data

Opacity of data policies from AI vendors presents a big challenge for companies that wish to leverage generative AI. Before feeding private information into any generative AI, users must first have a complete idea of where the data is going. Robert Blumofe, executive vice president and CTO at Akamai, stated, “Nobody should use or provide private information to a tool like this until you have a clear statement from the vendor on how they will use the information. Do they store, save, or share it? And does it become available to the public copy of the tool?”

There are various levels of data leakage associated with AI services. Maintaining logs, storing, or saving inputs from companies is the least risky, but indexing this information as training data is the riskiest. This means that the data could end up in a public version of the same tool, causing data leaks.

For companies compliant with the Global Data Privacy Regulation (GDPR), handing over data to any non-transparent AI data provider is a big no-no. Some GDPR regulators have also gone so far as to take a hardline stance against AI tools until they are safer with data, as seen by Italy’s ban of ChatGPT. Hence, the first step to ensuring data security with any AI vendor is to do due diligence and find a statement that clearly describes how the data will be used. Decision-makers must look at statements such as this one provided by Amazon Bedrock, which clearly states that user data will not be used and will not leave their private cloud.

Self-hosted models vs responsible vendors

While it may be tempting to just go with the pack and get a subscription to OpenAI’s APIs, the generative AI field is vast and growing quickly, especially when it comes to enterprise-focused use cases. Many key decision makers have also raised concerns as to how the technology can be used in their organisation.

Brandon Jung, VP, ecosystem and business development, at Tabnine, told AIM, “People say, if we have a model, the model has to live in the same place and have the same level of security as our code, understandably. Lot of companies might first install and use the enterprise models, and then do custom models.”

To understand how much of an impact a self-hosted AI can have, we can look at companies providing AI training services, such as Stability AI, one of the prominent open-source model providers in the AI field today. Emad Mostaque, the CEO of Stability AI, stated, “Dozens of major companies want models they own based on their own data and want to pay us to do it [train them]… It’s a pretty good model. We also build custom models for large companies and governments.”

These kinds of fine-tuned models might be a good in-between for the security provided by self-trained and self-hosted models and the convenience of off-the-shelf APIs, similar to RedHat’s business model. There is also an alternative in services like Amazon BedRock or Azure’s OpenAI cloud services, which maintain the data privacy of the underlying cloud services while allowing companies to adopt generative AI.

Even as all these advancements take place, one thing is clear: The AI cat is out of the bag. There is no going back for the enterprise, as the impact of AI goes beyond generating media. Whether it is automating repetitive tasks or transforming the data warehousing process, enterprises have to adapt to AI or fall behind. With the number of choices in the field and the current pace of innovation, decision-makers must keep data security in mind when picking AI solutions.

Access all our open Survey & Awards Nomination forms in one place