Making Data Useful

The pros and cons of synthetic data

Should you be getting on the synthetic data bandwagon?

Cassie Kozyrkov
5 min readJul 28, 2023

--

For my introductory articles on synthetic data, here’s a quick index to the series, broken up into bite-sized pieces:

Once you’re comfy with the basics, we can jump straight into the pros and cons of synthetic data.

Synthetic image by the author.

Biggest pros of synthetic data

  • Synthetic data can be the cheaper or easier to obtain than real-world data (if it isn’t, you probably don’t want it).
  • If getting real-world data is unfeasible, synthetic data gives you some hope that you might be able to build your desired automation solution anyway.
  • You have more control over the design of your synthetic dataset than your real-world dataset (this can be a con if you’re not careful).
  • Synthetic data can be great for debugging (see earlier article), especially for stress-testing your system’s ability to handle outliers and weird things.

Biggest pros of…

--

--

Cassie Kozyrkov

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita