What is Dark Data?

The universe of dark data

Dark data is, in essence, any data which should exist and be easily accessible, but is not.

Our planet currently creates, stores and shares over 100 zettabytes of data. This unimaginable volume of data is growing exponentially, and will reach a point in the near future where retaining more than a fraction of it will be impossible without innovations in media storage. And yet, hiding among all this data is an assumption: that the information we have at our fingertips is an accurate reflection of the rich, panoptical world around us, and that the decisions we make aren't strung out along a spiderweb of assumptions and obfuscations.

Whether we know it or not, we also live in a world of dark data: a shadow universe that underpins the brightly lit world of facts and laws. As our informational world expands, so too does the world of dark data, rippling out through healthcare, government, international law and other areas that we may have believed, naively, were immune to compromise. Lack of data is nothing new. What's new is the cascading impact bad decisions can have on you, the people you love, and the billions of other people sharing, often unequally, the air you breathe, the food you eat and the justice you expect.

From a taxonomic perspective, dark data includes data which is nonexistent (either because no one has thought to collect it, or because efforts to collect it have been inhibited), data which exists but isn't widely accessible (such as proprietary, classified or private data), data which, if available, would be inculpatory or exculpatory, data which is in some fashion obfuscated (whether unstructured, corrupted, redacted or corrupted), data which is untranslated or untranslatable, and data which is just plain wrong (such as misinformation, disinformation or poorly collected).