Making Data Useful

What is simulation?

Introducing a powerful technique for working with data

Cassie Kozyrkov
3 min readJun 26, 2023

--

In data science, it’s often a great idea to do a dress rehearsal that involves making a fake-but-plausible dataset before collecting, buying, or analyzing any real world data. We call this simulation.

Image by the author.

(Note: the links in this post take you to explainers by the same author.)

Typically, you’ll do it with the help of random number generators in your favorite data processing software (e.g. Python or R). You’d simply use the random distribution generating functions to draw observations with whichever distribution characteristics please you. If that’s all Greek to you, think of it as making the computer toss a coin or roll a die or spit out lottery numbers for you, except the coin/die/lottery can be as complicated as you like — you write the rules!

The videos below show you a demo of what it looks like.

By the way, this is very similar to the way a generative AI model outputs all those cool paragraphs and…

--

--

Cassie Kozyrkov

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita