When you take an honest look at the great data science experiment that has occurred worldwide, you find that the promises made by the advocates of data science have been far from the reality of what has been delivered. Once the data scientists were hired and then were asked to deliver, the results that have been achieved have been underwhelming, to say the least.
So why have the data scientists failed to deliver? There are a host of reasons.
A good starting point to explain what is going on is to look where data scientists have aimed their efforts at. Fig 1 shows where the efforts of the data scientists have been.
Data scientists have conducted their efforts primarily in two places – the world of structured data and the world of machine-generated data. From a strategic standpoint, this seems like a reasonable thing to do. There is a wealth of business value in structured data. But there is a problem here. The problem is that the analytical processing in the world of structured systems is already well traveled. Business analysts have been looking at the world of structured data for years now. It is sort of like looking for gold in California. In 1849 there was gold that was easily found in California. But today all of the easily found gold in California is gone.
Fig 2 shows this phenomenon.
So there is plenty of business value in structured data. But for new exciting opportunity, there is very little of that to be found in structured data.
The other place where data scientists are spending a lot of their capital is on machine generated data. Fig 3 shows the effort to find business value in machine generated data –
There is a lot of promise in looking at machine generated data. The data – for the most part – is virgin. It has never before been examined. And secondly, the structure of the data is uniform, or at least reasonably uniform. Both of these factors make the machine generated data very promising.
But there are some very serious drawbacks to looking for business value here. Some of the drawbacks are –
- The data is hard to find. There is so much that interesting values “hide” behind a mountain of other data
- The data is hard to find. In order to have business value the data must be captured and compared to other data. Finding these relationships is hard to do, especially in light of the sheer volume of data that has to be manipulated
- The data is hard to find. The volume of data and the technology used to manage the large volumes of data are optimized on the management of the data, not the analysis of the data.
- When interesting data is found, it is operational, not strategic. Unfortunately, operational data is less useful than strategic data.
But the final reason why data science struggles so much in this arena is that in many cases there just is not that much business value to be found in the first place.
No wonder the results achieved by the data scientist have been so underwhelming. In the world of structured data most of the good results have already been found. And in the world of machine generated data interesting data is hard to find if it is even there at all. Or if it is there it is operational, not strategic.
Register for our upcoming events:
- WEBINAR: HOW TO BEGIN A CAREER IN DATA SCIENCE | 24th Oct
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad
Enjoyed this story? Join our Telegram group. And be part of an engaging community.
Our annual ranking of Artificial Intelligence Programs in India for 2019 is out. Check here.
Provide your comments below
What's Your Reaction?
William H. Inmon (born 1945) is an American computer scientist, recognized by many as the father of the data warehouse. Bill Inmon wrote the first book, held the first conference (with Arnie Barnett), wrote the first column in a magazine and was the first to offer classes in data warehousing. Bill Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions.