Making Data Useful
The pros and cons of synthetic data
Should you be getting on the synthetic data bandwagon?
5 min readJul 28, 2023
For my introductory articles on synthetic data, here’s a quick index to the series, broken up into bite-sized pieces:
- What is synthetic data?
- The synthetic data field guide
- Why would you want synthetic data?
- AI-generated synthetic data
Once you’re comfy with the basics, we can jump straight into the pros and cons of synthetic data.
Biggest pros of synthetic data
- Synthetic data can be the cheaper or easier to obtain than real-world data (if it isn’t, you probably don’t want it).
- If getting real-world data is unfeasible, synthetic data gives you some hope that you might be able to build your desired automation solution anyway.
- You have more control over the design of your synthetic dataset than your real-world dataset (this can be a con if you’re not careful).
- Synthetic data can be great for debugging (see earlier article), especially for stress-testing your system’s ability to handle outliers and weird things.