There is a whole generation of management and technicians that have been sold on the proposition that there is unfound gold in Big Data. Many managers – IT and non-IT alike – have bet their careers and corporate credibility on the proposition that somebody really clever can go into the reams of Big Data in the corporation and come out with information so hidden and so important that there will be a renaissance of technology in the corporation.
This “hidden” gold phenomenon has given rise to the day and age of the “data scientist”. Has anyone noticed how much data scientists are making in Silicon Valley these days? It makes me want to go and rewrite my resume.
Big Data continues to fall
But what has the reality been for the hidden gold rush? Corporations have poured millions of dollars into the digging for gold. But the results have been very disappointing so far. If some industry had found a pot load of gold hidden in their Big Data you can believe that the discovery would have been heralded across the landscape. But no such pronouncements have been made.
Big Data continues to fall down the backside of the Gartner hype curve. When it will fall into the Gartner trough of disappointment (a standard part of the Gartner hype curve) is anyone’s guess. Big Data has been hyped so much by so many people that it has a long way to fall before reaching the trough of disappointment.
So what is happening here? Why has there been this disappointment in Big Data? Why haven’t the brilliant (and highly paid) data scientists been able to find all of the hidden gold that was promised?
There are (at least!) three possible scenarios.
There just isn’t any gold there to be found. Even the most brilliant data scientist cannot find gold if it isn’t there. Finding gold that isn’t there is called alchemy and is beyond the purview of even the most clever of data scientists.
I have an acquaintance in the utility business. His vendor told him to convince management to spend a lot of money putting metering data into a pile of Big Data. His vendor wined and dined upper management and the next thing you know a $10 million dollar project was borne. Data scientists madly went about capturing metering data for the past ten years. (Metering data is the monthly calculation of how much gas and electricity a business or a home uses each month.) The next thing you know a Big Data project held reams and reams of metering data. Then the data scientists went to work. They ran their algorithms. The scoured the data. They threw out outliers. They fit their curves. But three years later – no magic data, no hidden gold. There just are no hidden nuggets of gold in metering data, despite the sales pitch given by the vendor.
All the data scientists found was that when it got colder people turned up their thermostats. But anybody could have told them that. They didn’t need to spend $10 million dollars to find that out.
Stated differently, there just isn’t any gold there to be found, no matter how clever the data scientist has been.
There is some gold there but it is really hard to find, perhaps harder than the skills of even the most clever data scientist. There are lots of reasons why finding gold in big data (if it is even there) is so hard to do. There is a lot of data in Big Data. Hidden gold has a lot to hide behind. In California during the gold rush, the miners found out that they had to remove tons of dirt in order to get their hands on just a little gold. Big Data is the same way. You have got to remove huge amounts of unnecessary data to get at the really interesting data. Or another reason the golden data is so hard to find is that it is disguised. When looking through the reams of data found in Big Data it just is not readily apparent what the golden data looks like. Yet another reason why golden data is so hard to find is that the data must be paired with other data in order to determine its value. The problem with pairing any data in Big Data with any other data is that there is a lot of data there. Unless the data scientists knows exactly how to pair the data together, the data scientists spends a lot of (expensive) time spinning his/her wheels.
So even if there is hidden gold there, finding it is no bargain.
Even if there is hidden gold in Big Data, the cost of getting that gold is more expensive than the gold is worth. As a case in point, it is estimated that below Central City, Colorado there is buried over $18 billion dollars worth of gold. (That’s the truth. I am not kidding.) So why don’t you jump into your car, grab a pick and a shovel and head for Central City and start digging. Well there is a problem. In order to get to $18 billion in gold, you would have to spend (an estimated) $180 billion dollars. It costs more to get to the gold than the gold is worth.
The same phenomenon holds true of Big Data. The overhead of doing Big Data is tremendous. There is the cost of storage. There is the cost of software. There is the disruption to everyday activities. There is the exorbitant cost of the data scientist. There is general corporate overhead. When you get through adding up the real costs of a Big Data experiment, you find that even if you find gold, it is so expensive that it is not worth the effort.
So there are a lot of obstacles in front of the data scientist. Maybe there is a good reason why Big Data is in a freefall down the Gartner hype curve. Who knows when Big Data will find the bottom of the trough of disappointment? And who knows where t will go once it struggles up out of the trough of disillusionment.
As for management who has bet his/her career and credibility within the corporation on all the gold hidden in Big Data, maybe it is time to get the resume polished up. There is always Uber.
Now you can hear and see Bill Inmon on the Internet. Take a look at his new videotape education series on www.safaribooksonline.com. Bill covers IT topics from A to Z.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad
Enjoyed this story? Join our Telegram group. And be part of an engaging community.
Provide your comments below
What's Your Reaction?
William H. Inmon (born 1945) is an American computer scientist, recognized by many as the father of the data warehouse. Bill Inmon wrote the first book, held the first conference (with Arnie Barnett), wrote the first column in a magazine and was the first to offer classes in data warehousing. Bill Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions.