Making Data Useful

Data: A Hoarder’s Storage Locker, Not a Magical Museum

Why data isn’t as useful as we think

Cassie Kozyrkov
4 min readAug 27, 2023

--

There’s a common misconception that data is the next best thing to a holy relic of science — objective, mathematical, clean, correct, and above all, always useful.

A more accurate analogy for data would be a hoarder’s storage locker.

If you’re like most people, you envision data as a magical museum, meticulously organized and filled with diamonds and other gems, so brace yourself for a reality check!

A more accurate analogy for data would be a hoarder’s storage locker, filled to the brim with all kinds of stuff. If you’re willing to go spelunking into the mess you’ve inherited, you might find something valuable in there, but brace yourself for a pile of broken garbage that only its hoarder could love. (That is, if that hoarder even remembers what on earth they squirreled away in all that mess.)

Image is property of the author.

Most datasets come with about as much documentation as your sink of dirty dishes.

Data documentation is simultaneously an unsolved research problem — it’s not obvious how to design docs optimally for transparency and…

--

--

Cassie Kozyrkov

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita