← back
The Build-Operate Divide: Bridging Product Vision and AI Operational Reality
Takeaway
Crossing the V1-to-V2 quality chasm in AI products comes from a fast eval-iteration loop plus disciplined human-in-the-loop review, not better base models.
Summary
- Jeremy (Freeplay) and Chris Hernandez (Chime speech analytics) describe the 'quality chasm' where AI products ship a V1 but never reach a reliable V2.
- GenAI removes the data/training barrier vs traditional ML, but the resulting iteration-speed jump amplifies the need for strong AI ops.
- Crossing the chasm requires looping through monitoring → experimentation → testing/evaluation; product quality is a direct function of how fast you traverse that loop.
- Despite automation, delivering quality requires significant human elbow grease — human-in-the-loop review is essential to catch confident hallucinations (Lincoln 'invented Wi-Fi' example), especially in regulated domains like healthcare/finance.
evalshuman-in-the-loopai-ops
Original description
Product leaders see AI possibilities. Operations teams see implementation chaos. That disconnect can kill promising AI features before they ever reach users. In this session, Chris Hernandez (Chime) and Jeremy Silva (Freeplay) share an integrated framework that bridges product strategy and operational reality. You'll learn how they transformed fragmented AI workflows into a unified approach—from prototyping and prompt testing to human review loops and model benchmarking. We’ll explore how to build evaluation systems that satisfy both technical and business stakeholders, create effective HITL processes from day one, and use QA as a strategic enabler of generative AI quality. Most importantly, we’ll show how product and operations can move beyond friction—working together to deliver AI features that scale responsibly and ship faster, with confidence. About Jeremy Silva A seasoned ML engineer with extensive experience building and deploying language models in the healthcare sector, Jeremy currently serves as Product Lead at Freeplay. At Freeplay, he oversees an enterprise-ready platform that empowers teams to run experiments, create evaluations, monitor production systems, and label data—all within a unified environment. Drawing from hands-on collaboration with Freeplay's enterprise customers, Jeremy brings valuable "in-the-trenches" experience building LLM systems at scale. This direct customer engagement has also positioned him as a trusted advisor, helping organizations shape and refine their AI product roadmaps for maximum impact. Jeremy’s unique perspective spans technical implementation and product development making him well-positioned to share insights on effectively bridging the gap between AI capabilities and real-world product outcomes. About Chris Hernandez I’m a Manager of Speech Analytics at Chime, where I lead a team in developing and implementing AI-powered insights to enhance member experiences and operational efficiency. With over a decade of experience in leadership, AI, and machine learning, I specialize in designing and scaling AI solutions that drive measurable impact. At Chime, we believe that everyone can feel good about their money. We’re proud to be the most loved banking app™, providing millions of members with transparent, easy-to-use tools that help them unlock financial progress. By leveraging AI, my team helps uncover insights that improve quality, efficiency, and overall member satisfaction. I joined Chime because of its mission and the opportunity to work alongside an incredible team focused on innovation. I’m excited about the future as we continue to push the boundaries of AI-driven quality solutions—and we’re just getting started! 🚀 **The views and opinions expressed here are my own and do not necessarily reflect the official policy or position of Chime.** Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter