What is dark data, and why is it important?

Dark data is data which isn't readily available, for any number of reasons.

Data which isn't collected is clearly dark data -- this includes data which impacts marginalized populations and/or relies on underfunded government agencies. Many low-income economies are victims of this sort of dark data, making informed decision-making difficult.

Data which is collected but isn't properly aggregated, parsed or disseminated is also dark data -- for instance, research which is performed for the public good but then locked behind a paywall or distributed in unstructured format is clearly dark data.

Another category of dark data is data which is deliberately hidden, obfuscated or misrepresented, and this can be particularly insidious. This includes everything from proprietary data to misinformation and disinformation.

Understanding dark data is important because every day, people, companies and governments make decisions that impact the health, security and livelihood of their neighbors, customers and constituents, and insorfar as these decisions may or may not truly be “data-driven,” it's not uncommon for that data to be biased, spartan or missing altogether. It's important that everyone develop a critical eye toward dark data which informs policy.