Dark Data is any data which is basically ignored and remains stored without any indexing. It eventually becomes invisible to the researchers which finally results in it being lost. This data is generally unstructured because it has been collected by organisations unknowingly and has never been used for any decision-making or made available to the public.
Bob Picciano, Senior VP of Analytics at IBM said, “Data that is difficult to work with creates a high barrier to entry. People typically forego trying to get any information out of it. About 90% of data generated by most sensors and other sources on the market never get utilised, and 60% of that data loses its true value within milliseconds.”
How Is It Generated?
The main reason behind the dark data generation is the collection of a large amount of data and not enough analysis. Data is generating every moment, the moment a user clicks on some link or site, data is generated which helps the organisations to analyse in order to improve their business. But they utilise only a little amount of data which is structured and stored in databases and the rest remains as unstructured and lost between the other unindexed data.
According to reports, 7.5 sextillion gigabytes of data is generated worldwide every single day where 6.75 Septillion megabytes of data goes as dark data. The dark data remain stored in the files of data repositories without being analysed or processed. One more reason for the generating of dark data is the lack of proper analytical tools which support some other formats of data in order to analyse for the process of decision making.
Importance Of Dark Data In Big data
Dark data is a part of Big data. The data which are considered as dark can be from various logs, emails, old documents, ex-employee information, statements, ID numbers, etc. With the advent of Big data, the framework like Hadoop came into the picture and has been growing exponentially. This framework has been used by the organisations for the processing of large volumes of data including the dark data.
According to this report, in the year 2020, the digital universe is expected to reach 44 zettabytes where IoT will see an explosive growth of 20.8 billion connected devices which will be 269 times greater than the amount of data being transmitted to data centres from end-user devices and 49 times higher than the total data-centre traffic.
Since dark data can be said as the subset of Big data, it can be used to analyse and discover valuable insights in an organisation which will eventually present a much greater valuable insight than the organisations are currently gaining.
The dark data can be used for various purposes, for instance, a large amount of data is generated from servers, networking, firewalls, etc. which can be used to analyse the network security in the environment. Organisations can use dark data to analyse and develop patterns and other relationships for the process of decision making, etc.
How to bring light to “Dark Data”
Kenoobi Data has a Big Data solution with cognitive capabilities and a hybrid cloud architecture, allowing customers to manage their data, keeping their most sensitive data on-premise while unlocking new insights through natural language processing (NLP)-enriched enterprise and third-party content in the cloud. Customers can use the scalability and flexibility of cloud-based technology, including not just returning documents but extracting answers within those documents. In short, it can help organizations:
- Surface information from both internal and external data sources (e.g. news, financial data, social media, other general web data).
- Find information from across your enterprise.
- Build a custom dashboard to dynamically deliver relevant information.
You can get started here: https://data.kenoobi.com/bigdata