Last updated December 22, 2022
In AI Origins & Evolution

Council Post: Is there anything like too much data?

We've assumed that more data translates to more useful information and a higher likelihood of learning new things. In many ways, having a wealth of data is turning out to be more of a burden than an asset, which is something we've started to appreciate.

Share

Published on December 22, 2022

by Anshika Mathews

Listen to this story

Eric Schmidt, Executive Chairman of Google, said, “There were 5 exabytes of information created between the dawn of civilisation through 2003, but that much information is now created every two days.”

Effective ways to gather additional data has received a lot of attention in the last few decades. Data scientists, business executives, and computer experts have all pondered ways to gather, store, and display more data. We’ve assumed that more data translates to more useful information and a higher likelihood of learning new things. But this way of thinking has taken us down a rabbit hole where we are now swimming in data we can’t use. In many ways, having a wealth of data is turning out to be more of a burden than an asset, which is something we’ve started to appreciate. However, there are approaches we can take to make the most of this sea of data.

To dwell further on this, we had our industry experts for the roundtable session which was moderated by Karthik Sriram Chandrashekar, Director, Data Science & Engineering along with panellists, Satyamoy Chatterjee, Executive Vice President at Analyttica Datalab; Raj Babu, Founder and CEO at Agilisium; Vishnu Vasanth, Founder & CEO at e6data, Hari Sarvanabhavan, Vice President – Global Analytics at Concentrix and Vijoe Mathew, Global Director – Supply, Logistics & Finance at Anheuser-Busch InBev.

Data: A hype or an investment?

Now we are on edge over the last couple of decades, the amount of data that has proliferated the world and available for analysis and the driving impact is humongous. There’s, especially because of digitisation, so many industries that have increased availability of technology for every industry, and the advent of the Internet. The proliferation of the amount of data that is available to us, and the variety and kinds of data that is available to use, the ability for us to integrate different kinds of data, everything has exploded. A lot of people have jumped on the bandwagon of data as a competitive advantage, in industries. So, acquiring more data is considered a big thing, because big data was a huge hype, and getting as much data as possible, made available is, was, and for a long time still potentially considered as a worthy investment.

—Karthik Sriram Chandrashekar, Director, Data Science & Engineering

Data: Goliath for organisations

It’s definitely overwhelming. And it’s nothing to do with big data. The problem really is the variety in the number of data points. And number two, most importantly is how do you prioritise them because all the data points are not the same priority. The data points are all of various different priorities. When you have these data points of varying priority, how you take them under and prioritise them and work with them is overwhelming. Because that is the part that is overwhelming to the customer as the problem is when you work with someone in an organisation, there are multiple people, and there needs to be priority order. So, trying to build a consensus in that priority order of the data point is actually really overwhelming. And that’s what causes delays thus creating a lot of complications.

—Raj Babu, Founder and CEO at Agilisium

The problem happens when the enterprise confuses data driven with driving volume of data. The more the data you have doesn’t mean that you become naturally data driven. It’s what’s the context you create, the definition of the problem, and that becomes the starting point, where you start to look for the data. With that perspective the value that you derive out of data will again depend on the context. For example, if you take consumer internet where you’re building a monolithic application, for that context, the volume of data is important because every few basis point improvement in your optimisation will have a huge impact on the bottom line. Whereas if you take a context of a small chain of retail stores or the context of healthcare where every healthcare facility is unique and different, a centralised way of looking at and extracting value out of the central data repository that you have may not fit to the context. So, that’s where you have to localise and look at the context. The context should derive the value that you create and want to extract out of data.

—Satyamoy Chatterjee, Executive Vice President at Analyttica Datalab

Data is the new soil. If you take the principle and really take it to be a resource. From that standpoint, how do you really manage that resource? Too much of any natural resource can be a disaster. The way you really balance your data in your organisation is really one of the big callouts on how you should really look at this from a very fundamental perspective. And by saying balance is if you’re a small organisation and you’re trying to load up everything from a data perspective, you will never get your end outcome. Your journey is only going to be in collecting data. If you’re a very mature organisation, a competitive environment, and you are really lacking data, you can never use analytics as a differentiator. So, it’s very important to understand what that balance is. Judgments are probably going to be far more powerful than really going on a course to say that there’s more data coming in.

—Hari Sarvanabhavan, Vice President – Global Analytics at Concentrix

How much is your data costing?

There’s been a huge acceleration on the financial side, it’s now become a word that you will never end a customer conference without hearing. So, it’s clearly a challenge, which is why there’s a growing realisation now that as this cloud maturity happens, you realise that the cloud has a lot of amazing things. It’s elastic, it’s maintenance-free in terms of CapEx but it’s not cheap. So, suddenly, the onus becomes: how do I inventorise everything? How do I keep track of everything, and most companies, because of the state, even the best infrastructure today, are at a place where you don’t know which compute instance is mapped to which team in a company that has thousands of dashboards running every hour, you don’t know whether this dashboard should refresh every ten minutes, or every one hour or every 24 hours. And compute ends up being the largest portion of your bill. So, the link to now inventorise is in an automated way.

—Vishnu Vasanth, Founder & CEO at e6data

Bottom up approach versus Top down approach

The New Age companies who have the benefit of starting ground up can always plan to optimise right. Based on that, you can create a scorecard and you can keep keep optimising and after some time, you can solve set, if the data is not being used and there is no impact that the data is creating, there is no point keeping the data, you kind of try to force fit yourself to start extracting value out of the data and create a lot of redundancies. I think it’s important to look at data as an asset. And when you look at an asset, you also need to be able to quantify performing asset versus non-performing assets, and you need to have a way to get rid of your non-performing asset.

—Satyamoy Chatterjee, Executive Vice President at Analyttica Datalab

The question is, does every industry need a 99% accurate model? The first thing we are trying to understand is how much it’s gonna cost from taking the model from 80% accuracy to 99% accuracy. Even if you incur that cost, are we gonna get an incremental value from their accuracy? When it comes to data strategy, we never go with this concept of first bringing data to the cloud and then trying to build a concept. We’re trying to do a POC on a solution, trying to understand what exact variables are really impacting this solution to get to an accurate 90% model. And, if it doesn’t model then taking an insight-led decision on what exact variables will just bring to the club. If it’s creating value, then only the question of bringing data to the cloud comes from the source because for us, the cost of data and analytics is pretty much the same, the cost of data is pretty much the same or higher than that of analytics, the only cost is human power. There is no other cost in analytics, because all the tools that you use are free.

—Vijoe Mathew, Global Director – Supply, Logistics & Finance at Anheuser-Busch InBev.

Is too much data complicated?

Even fundamentally, whenever you look at a technology model to access it, what is the perceived value and the ease of use and accessibility. So, unless the perceived value is not going to be there, we will always struggle. From an industry standpoint, one of the deficiencies that we have is that we’ve never focused on what I call orchestrated action. So, we built models and expect that the downstream will work on its own, people will just continue and will really see value. And this will go as an automatic process. What is that perceived value that people are really looking to take and the ease of access that we are really giving people that is on both ends—this is both upstream and downstream.

—Hari Sarvanabhavan, Vice President – Global Analytics at Concentrix

The fact today is that even has time to look at dashboards and get insights out of that. The real question is trying to understand what is actually being consumed out of the entire petabytes of data that you kept on cloud and started paying for the VM costs, actual costs, and the human cost. We need to reverse engineer and understand what people are using, and only pay for them. I think over the last two or three years, the amount of data that you collected is more than the entire data that the human race collected from the port for centuries. So, we need to be super careful of what makes sense and what doesn’t make sense for the organisation.

—Vijoe Mathew, Global Director – Supply, Logistics & Finance at Anheuser-Busch InBev.

Risks for companies

When you hear too much data, the natural instinct is that every data point represents something that has happened in your business. Like when someone’s visited your website, signed up as a customer, updated a phone number or placed an order—it’s unimaginable to not collect it. Whether you want to store it in expensive storage on the cloud, or on cheap storage on the cloud, or retain it beyond a day or a month. The more data you have, you do get exposed to risks of misuse of compliance risks. It is definitely a problem. But it’s a good problem to have. Would you rather have that data knowing everything about your customer? Or would you be closer to the other end of the spectrum? I think the answer is clear.

—Vishnu Vasanth, Founder & CEO at e6data

Today, we still live in the so-called Data Governance mode but, we have to get to the larger perspective if we really want it to be what it has to be—we have some challenges that we should address up front before we get full scale implementation, especially in industries like healthcare and alike. We really need to be careful about how we really got to build our models. There is a thought process that’s required from a go-forward standpoint, from a data and AI standpoint across multiple industries.

—Hari Sarvanabhavan, Vice President – Global Analytics at Concentrix

Addressing the central topic—how much data is actually too much? The data will actually continue to flow in a highly linked society. There is the assumption that a firm can extract more value from its data; the more of it it has, the better. Organisations must adopt a clever strategy, nevertheless, in order to make data useful. They must determine which data can be discarded and which must be retained, processed, and analysed in order to make wise choices.
“No data is clean, but most is useful.” —Dean Abbott, co-founder and chief data scientist, SmarterHQ

This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill out the form here.

Access all our open Survey & Awards Nomination forms in one place

Anshika Mathews

Anshika is an Associate Research Analyst working for the AIM Leaders Council. She holds a keen interest in technology and related policy-making and its impact on society. She can be reached at anshika.mathews@analyticsindiamag.com.