How to look at your data — Jeff Huber (Chroma) + Jason Liu (567)

9.3K views · Aug 06, 2025 · 19:22 min · Watch on YouTube ↗

Takeaway

Build cheap golden-set fast-evals from real and synthetic queries, then look at conversation transcripts to find the implicit feedback users already give.

Summary

Huber pushes 'fast evals' — query/document golden sets that run in seconds and pennies, instead of expensive LLM-judge harnesses or generic MTeb benchmarks.
LLMs can synthesize realistic queries when prompted properly; Chroma showed synthetic recall@10 tracks real-user recall@10 closely for a Weights & Biases chatbot.
For W&B data, MTeb's top performer (Jina v3) actually underperformed Voyage 3 Large, and the original text-embedding-3-small was worst — empirically determined via fast evals.
Liu's part: analyze conversation outputs (cluster, topic-model) since user feedback like 'try again, this isn't what I meant' is buried inside conversations, not in thumbs widgets.
All code is open source at research.trychroma.com.

evalsretrievalembeddings

Original description

By the end of this talk, you'll understand what it takes to apply clustering techniques and data analysis to understand what is the valuable work that your AI application is doing through analyzing conversation histories and how to create generative evals to benchmark your newly discovered superpowers.

About Jeff Huber
Jeff Huber is the CEO and cofounder of Chroma. Jeff's work has been featured in TechCrunch, VentureBeat, MacWorld, GQ, Fast Company, Fortune, Forbes, Business Insider, Quartz and others. Chroma is a widely-loved and adopted open-source vector database.

About Jason Liu
Machine learning engineer, consultant, educator.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter