Making Data Useful

Why Data Will Disappoint You

Data expectations haven’t caught up to data economics

Cassie Kozyrkov
6 min readSep 27, 2023

--

To better understand a dataset, I recommend asking yourself these two questions:

  • Dataset purpose: is it a museum or a storage locker?
  • Dataset provenance: did you design the collection or inherit the data?

I introduced these questions in an earlier article, so now it’s time to spend a bit more time on a point worth repeating: the economics of data storage (cheaper every day!) have ushered in a norm of data hoarding, but the zeitgeist hasn’t adjusted accordingly. We still tend to expect something clean, scientific, objective, and useful in every dataset. My guess is that this mindset might be a carryover from the days when datasets were expensive to store and were thus often designed with care. For better or worse, those days are long gone.

A much better analogy for most modern datasets is that of a hoarder’s storage locker.

Image belongs to the author.

Data for solving specific problems

In this analogy, if you’re seeking to solve a very specific problem with a dataset, you have three options:

  • Build a museum
  • Buy a museum

--

--

Cassie Kozyrkov

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita