Member-only story

Making Data Useful

The pros and cons of synthetic data

Should you be getting on the synthetic data bandwagon?

Cassie Kozyrkov

5 min readJul 28, 2023

--

For my introductory articles on synthetic data, here’s a quick index to the series, broken up into bite-sized pieces:

Once you’re comfy with the basics, we can jump straight into the pros and cons of synthetic data.

Synthetic image by the author.

Biggest pros of synthetic data

Synthetic data can be the cheaper or easier to obtain than real-world data (if it isn’t, you probably don’t want it).
If getting real-world data is unfeasible, synthetic data gives you some hope that you might be able to build your desired automation solution anyway.
You have more control over the design of your synthetic dataset than your real-world dataset (this can be a con if you’re not careful).
Synthetic data can be great for debugging (see earlier article), especially for stress-testing your system’s ability to handle outliers and weird things.

Biggest pros of AI-generated synthetic data

All of the above, plus:

AI-generated synthetic data can be hard for humans to distinguish from the real thing.
If you get it from a good source, you’re taking advantage of a summary of millions of datapoints that you won’t need to collect/buy yourself.

Biggest cons of synthetic data

It’s synthetic! Real is always better if you’re trying to represent the real world, but sometimes it’s hard/expensive/impossible to get. Still, don’t expect synthetic data to represent reality.
It’s unnecessary when real data is cheap and plentiful.
It’s as simple-minded as we are. When we create a pithy recipe for making new datapoints, especially complex datapoints, we can’t trust ourselves to have represented all of reality’s relevant characteristics.

Written by Cassie Kozyrkov

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. decision.substack.com

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech