Dark Data is any data which is basically ignored and remains stored without any indexing. It eventually becomes invisible to the researchers which finally results in it being lost. This data is generally unstructured because it has been collected by organisations unknowingly and has never been used for any decision-making or made available to the public.
Bob Picciano, Senior VP of Analytics at IBM told a news portal, “Data that is difficult to work with creates a high barrier to entry. People typically forego trying to get any information out of it. About 90% of data generated by most sensors and other sources on the market never get utilised, and 60% of that data loses its true value within milliseconds.”
How Is It Generated?
The main reason behind the dark data generation is the collection of a large amount of data and not enough analysis. Data is generating every moment, the moment a user clicks on some link or site, data is generated which helps the organisations to analyse in order to improve their business. But they utilise only a little amount of data which is structured and stored in databases and the rest remains as unstructured and lost between the other unindexed data.
According to reports, 7.5 sextillion gigabytes of data is generated worldwide every single day where 6.75 Septillion megabytes of data goes as dark data. The dark data remain stored in the files of data repositories without being analysed or processed. One more reason for the generating of dark data is the lack of proper analytical tools which support some other formats of data in order to analyse for the process of decision making.
Importance Of Dark Data In Big data
Dark data is a part of Big data. The data which are considered as dark can be from various logs, emails, old documents, ex-employee information, statements, ID numbers, etc. With the advent of Big data, the framework like Hadoop came into the picture and has been growing exponentially. This framework has been used by the organisations for the processing of large volumes of data including the dark data.
According to this report, in the year 2020, the digital universe is expected to reach 44 zettabytes where IoT will see an explosive growth of 20.8 billion connected devices which will be 269 times greater than the amount of data being transmitted to data centres from end-user devices and 49 times higher than the total data-centre traffic.
Since dark data can be said as the subset of Big data, it can be used to analyse and discover valuable insights in an organisation which will eventually present a much greater valuable insight than the organisations are currently gaining.
The dark data can be used for various purposes, for instance, a large amount of data is generated from servers, networking, firewalls, etc. which can be used to analyse the network security in the environment. Organisations can use dark data to analyse and develop patterns and other relationships for the process of decision making, etc.
Interest Over Time Graph Of Using Structured Data And Dark Data
One of the primary output of any organisation is data. While a little portion of data has been taken great care of by the researchers for decision-making purposes, it is also crucial to lessen the dark data as much as possible by pruning, auditing, etc. Increase in dark data will only conclude an increase in storage cost as well as security risks.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad