Active Hackathon

How bad is bad data

The Global Financial of 2008 was driven by bad data
Listen to this story

Could all the data captured by organisations today be considered good? Reports say otherwise—a lot of this captured data ends up being ‘bad’. 

But, what does this ‘bad data’ entail for organisations dependent on ‘accurate data’ for driving business decisions? 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Bad data refers to the data that is inaccurate, inaccessible, poorly compiled, duplicated, has key elements missing or is simply irrelevant to the purpose it is to be used for. For instance, the Global Financial of 2008 was driven by bad data that overstated the actual worth of mortgage-backed securities and collateralised debt obligations. 

In recent years, the problem of bad data has gained focused attention primarily owing to the grave monetary implications that its use could entail for the future of organisations. 

How costly is bad data?

Several surveys and studies show that companies lose millions of dollars due to bad data. For instance, in 2021, Gartner cited that every organisation incurred a loss of USD 12.9 million due to poor quality data. 

A 2016 study by IBM produced even more jaw-dropping results—reporting that businesses in the US lost as high as USD 3.1 trillion each year due to bad data.

Along with monetary losses, bad data also leads businesses to draw inaccurate conclusions that have adverse short-term and long-term consequences. It leads to poor decision-making and business assessments which then negatively impacts the overall customer experience. 

Bad data could also impact the operational efficiency of a business. For example, a marketing company could end up sending advertisements to the wrong target audience—defeating the ad campaign’s sole purpose, or an insurance company could end up paying the wrong client for a claim. 

According to Thomas C Redman, President of Data Quality Solutions at the Data Doc, ‘bad data’ is costly as decision-makers—managers, data scientists and knowledge workers— incorporate it in their day-to-day work. 

Incorporating such erroneous data is time-consuming. Often errors occur when these people tweak the data as per their requirements to meet deadlines without consulting the data creator. 

Data professionals spend a significant portion of  their time cleaning and organising such data, identifying and fixing the errors and confirming the sources. Such quality control work consumes 50 per cent of knowledge workers’ time, and increases to 60 per cent for data scientists.

Data strategy

Data strategy—the tools, processes, and rules to manage, analyse and act upon business data—helps businesses make informed decisions while keeping the data safe and compliant. 

To extract value from data, businesses must adopt a systematic approach to collecting, storing, analysing and managing data. 

Carving out suitable data strategies that align with the purpose of an organisation help address challenges like slow and inefficient business processes; issues of data privacy, data integrity, and data quality; inefficient movement of data between different parts of the business, or duplication of data by multiple business units; lack of clarity about the business needs and goals; and lack of understanding of critical business processes and entities. 

Companies like Stitch, Zeotape, Tableau, Datumize and CDQ dwell on organisation-specific data strategies to derive powerful business insights.  

Technology

Adopting advanced technology for robust data management is a significant step in dealing with bad data.

Jonathan Grandperrin, co-founder and CEO at Mindee, suggests that companies can use data extraction application programming interfaces (APIs) to set up a strong information base. They help make data more structured, accessible and accurate, thereby increasing digital competitiveness. In addition, they help organisations build fast and efficient workflows that operate smoothly, decreasing error and improving efficiency. 

Data extraction APIs can be of two types—APIs with predefined data models and APIs where users can define specific data models. In the former case, the type of information to be extracted from a document is preset and the algorithms are already trained with massive inputs. In the latter case, users train the API by uploading relevant documents and selecting the relevant information to be extracted. 

Segment, a customer data platform (CDP) that helps companies harness first-party customer data, has developed ‘Protocols’ that offers quality data. ‘Protocols’ helps validate data at points of collection and automate enforcement controls. In addition, ‘Protocols’ helps in standardising data collection across the organisation. 

“Until we started standardising our data, people didn’t realise how messy it had become. With ‘Protocols’, we can be confident that data quality issues don’t happen anymore”, notes Colin Furlong, Business Intelligence Analyst, Typeform.

Data mining services

Businesses hire data mining services to inculcate best data hygiene practices that increase the chances of better customer-oriented campaigns, exponential B2B sales, enhanced reputation, lead generation and increased production rates. In addition, these services also help organisations streamline the incoming data and weed out data impurities. 

Thriving businesses require data mining services to ensure the accuracy of their data. Several companies like BizPorspex, Damco and Eminenture provide data mining services—for example, ‘BizProspexo’ offers several on-demand data mining services such as data appending i.e., the process of uploading missing or rectifying incorrect information like email IDs, phone numbers, addresses and demographic data of customers; data scrubbing; and CRM cleaning and email list building. 

Treat data like a product

In contemporary times, data drives growth in every possible sector one could think of. Thus, ‘productising’ data needs to be prioritised. 

Treatment of data as a product ensures that the organisation preserves the same standard of quality throughout at any cost. In addition, it would help businesses operationalise their data, ensuring it is monitored and maintained like a production-grade system.

Jonathan Grandperrin suggests other useful ways to resolve the issue of bad data, such as ensuring data portability, bettering the institutional security consciousness and establishing strong learning models. 

Back to basics

Beside such helpful counsel, often the simplest solution for organisations plagued with bad data is to go back to basics i.e., returning to the source of the data, especially since there might be a possibility that the data was sourced from the wrong place. At times, even when the source of the data might be correct, the nature of the data may not fulfil its intended purpose. 

In such cases, organisations need to review the type of data they are working with. For example, if a business has focused on sales figures, it should also focus on something beyond sales, like growth. 

At other times, businesses can fine-tune the data processing technique to reduce the margin of error. Such fine-tuning can be achieved by double-checking the data sources and roping third parties in to review this data.

More Great AIM Stories

Zinnia Banerjee
Zinnia loves writing and it is this love that has brought her to the field of tech journalism.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022

[class^="wpforms-"]
[class^="wpforms-"]