Making Data Useful
Why Data Will Disappoint You
Data expectations haven’t caught up to data economics
To better understand a dataset, I recommend asking yourself these two questions:
- Dataset purpose: is it a museum or a storage locker?
- Dataset provenance: did you design the collection or inherit the data?
I introduced these questions in an earlier article, so now it’s time to spend a bit more time on a point worth repeating: the economics of data storage (cheaper every day!) have ushered in a norm of data hoarding, but the zeitgeist hasn’t adjusted accordingly. We still tend to expect something clean, scientific, objective, and useful in every dataset. My guess is that this mindset might be a carryover from the days when datasets were expensive to store and were thus often designed with care. For better or worse, those days are long gone.
A much better analogy for most modern datasets is that of a hoarder’s storage locker.
Data for solving specific problems
In this analogy, if you’re seeking to solve a very specific problem with a dataset, you have three options:
- Build a museum
- Buy a museum