Making Data Useful
How Good Data Goes Bad
The data quality crisis no one is talking about
A rule of thumb to save you tears in the long run is to assume every dataset is more like a hoarder’s storage locker than a well-curated museum until proven otherwise.
When in doubt, assume your data’s a junkyard.
But even if you’re not dealing with a dataset that’s a hoardsplosion of we-may-as-wells, there are two ways that fit-for-purpose data turns into garbage:
- Information loss during conversion
- Information selection issues
There are plenty more ways that bad data can happen to good intentions, but let’s talk about these two major ones for now.
Information loss during conversion
Data quality erodes whenever there’s a problem with the physical conversion of reality into electronic records. This is a relatively simple issue that manifests in numerous ways, from janky hard disks and broken equipment to real world plans going awry: Were your sensors calibrated? Did your laptop run out of juice? Did the people you paid to enter survey information actually write down what they were supposed to? Did you wait too long while you relied on human…