It’s time to tackle a topic from psychology that’s essential for all data professionals:
How do you measure user happiness?
Let me try to guess what you’re thinking. Survey responses? Lack of complaints? Number of returns? Propensity to click?
The correct answer is…
You don’t. If you think your happiness survey can close an open question philosophers have been kicking around for millennia, think again, Professor Dunning-Kruger.
How about success? How do you measure it? You don’t.
How do you measure anger? You don’t. (See the pattern? Yes, I’m making you angry on purpose. …
This article takes you on a tour of the most popular parameters in statistics! If you’re not sure what a statistical parameter is or you’re foggy on how probability distributions work, I recommend scooting over to my beginner-friendly intro here in Part 1 before continuing here.
Note: If a concept is new to you, follow the link for my explanation. If the early stuff feels too technical, feel free to skip to the cuddly critter memes lower down.
Ready for the list of favorites? Let’s dive right in!
This word is pronounced “average.”
Test yourself! How many of these core statistical concepts are you able to explain?
CLT, CDF, Distribution, Estimate, Expected Value, Histogram, Kurtosis, MAD, Mean, Median, MGF, Mode, Moment, Parameter, Probability, PDF, Random Variable, Random Variate, Skewness, Standard Deviation, Tails, Variance
Got some gaps in your knowledge? Read on!
Note: If you see an unfamiliar term below, follow the link for an explanation.
A random variable (R.V.) is a mathematical function that turns reality into numbers. Think of it as a rule to decide what number you should record in your dataset after a real-world event happens.
A random variable is…
The coarsest way to, ahem, classify supervised machine learning (ML) tasks is into classification versus prediction. (What’s supervised ML? See the video below if you need a refresher.)
Let’s start by making sure we’re all on the same page with the basic basics.
If you’re new to these terms, I recommend reading this. For the too-busy folk among you, here comes the briefest of reminders:
The point of ML/AI is to automate tasks by turning data (examples)…
If you’re the kind of person who likes to keep a tidy mind, here’s why your lip might curl in disgust when confronted with the title question in my recent article on the difference between classification, regression, and prediction: There is no classification.
There is no classification.
Let me explain.
Back when dinosaurs roamed the earth, it was fashionable to kick a statistics textbook off with a first chapter on the basics of data. To make sure that students had something to memorize for their first test, opening chapters usually featured this jargon:
Close your eyes and try to name as many data types as you can. Got them? Now let’s play bingo! (Look for the bolded words.)
Back when dinosaurs roamed the earth, it was fashionable to kick a statistics textbook off with a first chapter on the basics of data.
To make sure that students had something to memorize for their first test, opening chapters usually included some jargon for different kinds of data:
“What challenges are you tackling at the moment?” I asked. “Well,” the ex-academic said, “It looks like I’ve been hired as Chief Data Scientist… at a company that has no data.”
I don’t know whether to laugh or to cry. You’d think it would be obvious, but data science doesn’t make any sense without data. Alas, this is not an isolated incident.
Data science doesn’t make any sense without data.
So, let me go ahead and say what so many ambitious data scientists (and their would-be employers) really seem to need to hear.
If data science is the discipline of…
You might have heard of analysts, ML/AI engineers, and statisticians, but have you heard of their overpaid cousin? Meet the data charlatan!
Attracted by the lure of lucrative jobs, these hucksters give legitimate data professionals a bad name.
[In a hurry? Scroll down for a quick summary at the bottom.]
Chances are that your organization has been harboring these fakers for years, but the good news is that they’re easy to identify if you know what to look for.
Data charlatans are so good at hiding in plain sight that you might even be one without even realizing it. Uh-oh!
The curse of dimensionality! What on earth is that? Besides being a prime example of shock-and-awe names in machine learning jargon (which often sound far fancier than they are), it’s a reference to the effect that adding more features has on your dataset. In a nutshell, the curse of dimensionality is all about loneliness.
In a nutshell, the curse of dimensionality is all about loneliness.
Before I explain myself, let’s get some basic jargon out of the way. What’s a feature? It’s the machine learning word for what other disciplines might call a predictor / (independent) variable / attribute /…
Technically, p-value stands for probability value, but since all of statistics is all about dealing with probabilistic decision-making, that’s probably the least useful name we could give it.
Instead, here are some more colorful candidate names for your amusement.
Head of Decision Intelligence, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita