Data Scientists Are Actually Data Garbage Collectors: Bill Inmon

Bill Inmon

Bill (William H) Inmon is a best-selling author and most well known as the father of data warehousing. Over the decades, Bill has helped modern-day businesses recognise the power of data warehouse as a foundation for analytics.

In 2017, Bill Inmon’s book Turning Text into Gold provided in-depth analytics of text analytics and how text is a place modern businesses are ignoring. According to Bill, there is a wealth of information in that textual data, but there are some problems with the form of text data as mostly they are unstructured, and not organised for valuable insights. Bill’s fascination with text analytics continues to date.

At the recently-held plugin virtual event for data science professionals (organised by Analytics India Magazine), Bill Inmon talked about Textual Analytics, and how businesses should be connecting unstructured text data into data warehouses and analytics solutions. 

During his talk at the plugin, we interacted with Bill to know more, and here are the excerpts from the interaction: –

AIM: What do you think about unstructured data, and how can businesses gain value from unstructured data?

Bill: If you look at all the data in corporations, you will find that there is only a small percentage of data which is structured, and the large majority of it is unstructured. A lot of people have spent their entire careers looking at structured data and not spend any time on unstructured data. 98% of corporate decisions are made on 10% of the data. To extract value from unstructured data, there are many facets to the process of getting value from unstructured data. One facet is actually access to data. Then there is the interpretation of data, and then there is the creation of a database. You have got to do each one correctly. 

AIM: You are well known as the father of the data warehouse, what do you think is the future of the data warehouse?

Bill: As long as we have unintegrated data and as long as companies have multiple systems that have lots and lots of data, we are going to have a data warehouse around. Corporations cannot handle the data that they have because data is basically unintegrated, and what a data warehouse does for people is that it allows them to have a corporate understanding of data. And, if that need goes away, then data warehouses go away. Until that need goes away, data warehouses will be around in future. 

AIM: What advice would you have for data architects to create scalable resilient architectures?

Bill: It disturbs me greatly that when I see data scientists talk, rarely do they talk about data architecture. I recently had a conversation with a data scientist who said we go to school and learn all these algorithms and statistical techniques. Then we go into our jobs, and we spend 98% of our time gathering, finding and cleansing data. He said he wasn’t a scientist, he was instead a data garbage man. I don’t understand why they don’t have a regular part of the curriculum that focuses on getting the data. So, getting the data, that’s the challenge. Once you have the data, doing data science on that is well known if not trivial. But most people in data science rather concentrate on statistics, algorithms and other techniques. 

AIM: How can businesses approach text analytics to find untapped value?

Bill: It is important to use text in decision making. We need to put text into a database. We need to put text in a database because only a database can handle large volumes of information that we can find. We have something called Textual ETL, which is the evolution of how we do text analytics and is state-of-the-art today. What Textual ETL does is that it allows you to read the text and put text into a standard database management system. The problem is that after you have created the database, no one really wants a database. Managers of the world want visualisation, they don’t want a database. What they really want are things presented to them rapidly and beautifully. Once we have gone into the form of a database, we can then present information in a visual format. 

AIM: What are the examples of how text analytics can be valuable for certain industries?

Bill: Restaurants, hotels, banks, telephone companies, doctors, hospitals, pharmaceuticals, airlines, government agencies, all of whom have tremendous value for text analytics, turning textual information into important decisions. Hotels and restaurants are usually happy in listening to what their customers are saying because their business depends directly on the satisfaction of customers. In our study, we took the feedback coming to a restaurant, loading thousands of such feedback coming every month, and we put it through Textual ETL. From there, we turned it into an analytical database. We found extremely useful pieces of insight that the textual data which can create value for the restaurant. Another study that we did is we took a look at the airline passenger feedback. 

When you take a look at the medical records, you can find all kinds of valuable information such as diagnosis information, medical history, surgical history, information about medications. Again, this is written in the form of text. What you need to do is that if you want to look at a million records, it should be put into the form of a database. Once it’s put in the form of a database, now you can start to read a million records and derive insights from them. 

Download our Mobile App

Vishal Chawla
Vishal Chawla is a senior tech journalist at Analytics India Magazine and writes about AI, data analytics, cybersecurity, cloud computing, and blockchain. Vishal also hosts AIM's video podcast called Simulated Reality- featuring tech leaders, AI experts, and innovative startups of India.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.

Intel Goes All in on AI

Pat Gelsinger said, there are three types of chip manufacturers, “you’re big, you’re niche or you’re dead”