The fastest way to diagnose your statistical alignment

What if I told you that I can show you the difference between Bayesian and Frequentist statistics with one single coin toss?

Before we go any further, the demonstration works best in video form, so don’t read the summary and spoilers below until you’ve seen it. In case some terms are unfamiliar, I’ve linked to friendly explanations to help you out.

Why these cat pics? On the left, it’s all about perspective. On the right, it’s all about quantities that don’t move around. But mostly, I needed something to shield your eye from the spoilers below until you’ve seen the video.

Summary

In the video, there’s a moment where I ask you, “What is the probability that the coin in my palm is up heads?” The coin has already landed, I’m looking at it, but you can’t see it yet…


You know machine learning is off to a rocky start when…

Adapted from Wikipedia.

“Among the machine learning strategy consultations you’ve done, which kinds of product team were the most challenging to work with?”

After consulting on hundreds of machine learning projects, I’ve learned to pay attention to early warning signs that the client is in danger of shooting themselves in the foot. Here are my top three:

  1. They’re marketing victims with unrealistic expectations
    * Special case: Willing to launch at all costs
    * Special case: No data (and other basic requirements)
  2. There’s a lack of respect for skills diversity
    * Special case: Toxic snobbery
  3. The team has no idea who’s in charge
    * Special…


Too good to fail? The surprising way a top-performing system can hurt you

Imagine two (human) workers:

  • Chris Careless is a constant disappointment to you, performing your task well 70% of the time and producing an absolute cringe the rest of the time. Watching Chris make 10 attempts is more than enough to provoke an “oh, dear” response from you.
  • Ronnie Reliable is another story. You’ve seen Ronnie in action over a hundred times and you’ve been consistently impressed.

Here comes the million-dollar question. Which worker is more dangerous to your business?

On a high-stakes task, the answer could be Ronnie Reliable… but perhaps not for the first reason that comes to mind.

This isn’t about bad projects


Essential psychology for all data professionals

It’s time to tackle a topic from psychology that’s essential for all data professionals:

How do you measure user happiness?

Let me try to guess what you’re thinking. Survey responses? Lack of complaints? Number of returns? Propensity to click?

The correct answer is…

Image: SOURCE

You don’t. If you think your happiness survey can close an open question philosophers have been kicking around for millennia, think again, Professor Dunning-Kruger.

How about success? How do you measure it? You don’t.

How do you measure anger? You don’t. (See the pattern? Yes, I’m making you angry on purpose. …


Take a “moment” to explore some fundamentals

This article takes you on a tour of the most popular parameters in statistics! If you’re not sure what a statistical parameter is or you’re foggy on how probability distributions work, I recommend scooting over to my beginner-friendly intro here in Part 1 before continuing here.

Get your distribution basics in Part 1 if you’re new to this space. Image: SOURCE.

Note: If a concept is new to you, follow the link for my explanation. If the early stuff feels too technical, feel free to skip to the cuddly critter memes lower down.

Ready for the list of favorites? Let’s dive right in!

Mean

This word is pronounced “average.

Expected value

An expected value, written as E(X) or…


Back-to-basics on data science fundamentals

Test yourself! How many of these core statistical concepts are you able to explain?

CLT, CDF, Distribution, Estimate, Expected Value, Histogram, Kurtosis, MAD, Mean, Median, MGF, Mode, Moment, Parameter, Probability, PDF, Random Variable, Random Variate, Skewness, Standard Deviation, Tails, Variance

Got some gaps in your knowledge? Read on!

Note: If you see an unfamiliar term below, follow the link for an explanation.

Random variable

A random variable (R.V.) is a mathematical function that turns reality into numbers. Think of it as a rule to decide what number you should record in your dataset after a real-world event happens.

A random variable is…


Getting Started

Know your species of machine learning task

The coarsest way to, ahem, classify supervised machine learning (ML) tasks is into classification versus prediction. (What’s supervised ML? See the video below if you need a refresher.)

Before we dive deeper into supervised learning, in this video I give you a quick refresher on how that differs from unsupervised learning.

Let’s start by making sure we’re all on the same page with the basic basics.

Basics: Algorithm vs Model

If you’re new to these terms, I recommend reading this. For the too-busy folk among you, here comes the briefest of reminders:

The point of ML/AI is to automate tasks by turning data (examples)…


If you’re the kind of person who likes to keep a tidy mind, here’s why your lip might curl in disgust when confronted with the title question in my recent article on the difference between classification, regression, and prediction: There is no classification.

There is no classification.

Let me explain.

There is no classification… and regression is something else entirely. Meme template from The Matrix.

Discrete versus continuous

Back when dinosaurs roamed the earth, it was fashionable to kick a statistics textbook off with a first chapter on the basics of data. To make sure that students had something to memorize for their first test, opening chapters usually featured this jargon:

  • Continuous data (measured, not counted), e.g. 173.5…


Continuous, discrete, categorical, cardinal, sequential… keep going!

Close your eyes and try to name as many data types as you can. Got them? Now let’s play bingo! (Look for the bolded words.)

Data types in statistics and analytics

Back when dinosaurs roamed the earth, it was fashionable to kick a statistics textbook off with a first chapter on the basics of data.

Don’t worry, it’s not this complicated of a data taxonomy. Some of these critters look wonderfully derpy. Image: SOURCE.

To make sure that students had something to memorize for their first test, opening chapters usually included some jargon for different kinds of data:

  • Continuous data (measured, not counted), e.g. 176.5 cm (my height), 12% (free space on my phone), 3.141592… (pi), -40.00 (where Celsius meets Fahrenheit), etc.
  • Discrete data (counted…


Why it’s important to hire data engineers early

“What challenges are you tackling at the moment?” I asked. “Well,” the ex-academic said, “It looks like I’ve been hired as Chief Data Scientist… at a company that has no data.”

“Human, the bowl is empty.” — Data Scientist. Image: SOURCE.

I don’t know whether to laugh or to cry. You’d think it would be obvious, but data science doesn’t make any sense without data. Alas, this is not an isolated incident.

Data science doesn’t make any sense without data.

So, let me go ahead and say what so many ambitious data scientists (and their would-be employers) really seem to need to hear.

What is data engineering?

If data science is the discipline of…

Cassie Kozyrkov

Head of Decision Intelligence, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store