IRREVERENT DEMYSTIFIERS

What are: Embeddings? Vector Databases? Vector Search? k-NN? ANN?

A simple explainer to debuzz these AI buzzwords

Cassie Kozyrkov
6 min readFeb 2, 2024

--

Are you drowning in buzzwords and unsure what the fascination with these terms is?

  • Embeddings
  • Vector databases
  • Vector search
  • k-NN and ANN

Never fear, I’ll help you understand the context and the basics without getting you mired in the nitty gritty details.

What’s the context for using these terms?

The things you can do with AI are pretty awesome, but there’s a catch. If you’re automating at scale, you might find that AI is slow or expensive or both.

Hence all the buzz about vector search, vector databases, and so on. The context is need for speed, especially when it comes to text (think GPT and its friends).

The context is need for speed.

So when you hear these terms, what you’re hearing is several million frustrated engineers sighing in relief that these things exist, then grumbling little demands for it all to be even better, cheaper, faster. (All while forgetting just how magical it is that we can train models on these wild amounts of data in the first place. Ah, the human condition.)

Okay, so what are these terms actually? Well, it doesn’t help that the, ahem, “vectorness” of the name is misleading.*

The “vectorness “of the name is misleading.

Buzzwords debuzzed

Embeddings:

“Put your data in a format that AI can guzzle.”

Embeddings

Raw images/text/video/audio etc. are not optimized for pattern-finding (AI) algorithms. They’re optimized for your eyeballs and earballs. And there’s no notion of similarity or distance in them, so the word “giraffe” is as different from “fish” as it is from “fist” (the second letter matches). But you have a sense that fish might be a bit closer to giraffe (they’re both animals) and it would be great to represent their data like that while also saving space. (Unless, of course, fist is close to giraffe for your needs, in which case you’d need a…

--

--

Cassie Kozyrkov

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita