Endless Right Answers: Explaining the Generative AI Value Gap

The biggest challenge of the generative AI age is leaders defining value for their organization

Cassie Kozyrkov
6 min readJan 9, 2025

Generative AI (GenAI) seems to promise unprecedented innovation and automation opportunities… yet so many leaders struggle to articulate the actual, tangible value that GenAI delivers at the organizational level.

So, what’s behind this apparent GenAI value gap and what can you do about it?

Let me invite you to pause the mainstream conversations around how GenAI success depends on foundation models and data and customizations (all important!) to consider another — and arguably more important — input to GenAI business success: leadership.

Image created using generative AI.

I’m convinced the GenAI opportunities are real, but it takes a special kind of leadership mindset to tap into them.

Leadership as a technical consideration

“…Asking one question at every stage of your generative AI journey — ‘what is the business value here’ — can help keep your organization on track.”

This sentiment comes from Tom Godden of Amazon Web Services (AWS) in a Harvard Business Review article on technical considerations for business leaders operationalizing GenAI.

This is overwhelmingly good advice. But there’s a danger that you snooze right past it because it sounds like a standard tip for execs: don’t do things without thinking about the value they bring to your organization.

But, more so than for other technologies (including the traditional AI you’ve been used to) value is — as this HBR article suggests — a *technical consideration* and leadership is a technical input into GenAI systems.

And when leadership as a technical input is missing, GenAI will repeatedly miss its potential in your organization.

Why is this a special challenge with GenAI versus other data-fueled technologies?

A new way of thinking: Endless “right” answers

Just as the executive zeitgeist is catching up with what it means to
manage an ecosystem that includes traditional AI, GenAI asks us to make yet another mindset shift; one in which there are endless right answers.

  • Traditional AI is for automating tasks where there’s one right answer.
  • Generative AI is for automating tasks where there are endless right answers.

For example:

  • Traditional AI: When I stand in front of an automated passport control booth, if the facial recognition system mislabels me, it might produce any one of a whole host of wrong answers (Bruce Wayne, Harry Potter, the list goes on) but there’s only one right answer (Cassie Kozyrkov).
  • Generative AI: When I ask an AI assistant to generate an image for me, I get a fairly solid result. When I repeat the same prompt, I get a different perfectly adequate image. Both are right answers… but which one is right-er?
Which of these images is the “best” response to the prompt “art gallery with orange walls and black floor with many copies of the same blue painting of flowers”? How much better is it than the others? That’s very much in the eye of the beholder.

Deciding on Metrics in an Endless-Right-Answers World

Without a mindset update, assessing the return-on-invest (ROI) of GenAI is a statistical cul-de-sac. For an individual user, it may be enough that GenAI feels useful, but that’s not enough for your organization…

To prove that your investment in technology has impact, you must be able to measure its performance. To ensure the statistical validity of that measurement, you’ll need to come up with metrics and definitions in advance (as I’ve explained here and here). To anticipate and score the range of GenAI’s endless right answers… well, that’s the unprecedented leadership challenge of its adoption.

ROI is a thorny concept when “best” is in the eye of the beholder… at scale.

This is where leaders and leadership are so important: “best” is in the GenAI of the beholder (please forgive the pun — I had to). And who’s the beholder? Whoever’s in charge (admittedly not always a simple notion in a large organization).

When multiple answers can all be valid in their own way, designing performance metrics is a special challenge because success depends on context, judgment, and subjective preferences.

So, you — the leader — must define what value means for your organization, and then champion a new way of thinking about measurement in an organization that may not be ready for it.

Overcome this hurdle — a grand challenge that is more about people than technology — and you’ll unlock a treasure trove of opportunity.

Overcoming the challenge of endless right answers

Let me share a few suggestions that might help with your GenAI performance measurement and benchmarking journey:

  • Get clarity on the who. Perhaps the most important question in a GenAI-fueled organization isn’t technical at all: who gets to decide what success looks like?
  • Get clarity on the what. Metrics in a endless-right-answers context begin with a clear definition of what you’re trying to achieve. Is your goal to inspire creativity, improve efficiency, or align with a specific tone? Or something else entirely?
  • Be the author of meaning. Instead of looking to your quants for simple metrics, appreciate that designing GenAI metrics is a process that itself has endless right answers and involves judgment calls that brave leaders must own.
  • Think in terms of good enough. Instead of comparing right answers, consider setting standards that whittle complex output down to a familiar binary: acceptable or unacceptable. Note that you are likely to find justification for fewer model upgrades if you take this approach, which could be for the best, particularly when the output isn’t directly user-facing.
  • Use human ratings as a proxy. Borrowing from social science and last decade’s best practices for using trusted raters to score a system’s outputs, you might choose to rely on human evaluation of sampled output.
  • Try an experiment. A statistically valid way to skip the headache of direct measurement is to run a controlled experiment (such as an A/B test) to prove that your GenAI materially impacts one of your KPIs.
  • Tie it back to the business. Where possible, expressing GenAI output in terms of a measurable relationship to a straightforward business metric can anchor your approach in reality.

Scale demands to be measured

How is all this different from managing human creative workers? Unlike a human worker, an AI system cannot take responsibility for itself.

That’s on you as a leader.

If you’re unwilling to step up to being the voice of a GenAI system’s value, you’ll be limited to offering GenAI as an at-work productivity tool for your people to use as they please. At best, you’ll go as far as simple human-in-the-loop systems, but no further.

To unlock the full potential of GenAI, you must face the idea of taking responsibility for endless right answers head on.

An AI system cannot take responsibility for itself. That’s on you as a leader.

It’s now up to business leaders to be the voice of value, and to contextualize GenAI opportunities and priorities in terms of expected business outcomes and values. Only then — unified behind a clear and common goal — can organizations align to harness the full power of GenAI.

As a leader, creating meaning out of ambiguity will fall squarely on your
shoulders: the more you model a new standard of clarity and purpose, the more you’ll inspire others to rise to the challenge, too.

Acknowledgments

The article that kicked off my musings is this one on HBR. Check it out and if it leaves you keen for more expert cloud leadership content from its contributor, Tom Godden, give him a follow on LinkedIn.

Thank you, #AWS, for partnering with me on this one. #sponsored

--

--

Cassie Kozyrkov
Cassie Kozyrkov

Written by Cassie Kozyrkov

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. decision.substack.com

Responses (5)