← back

How Zapier Builds AI Products and Features with the Help of Braintrust: Ankur Goyal & Olmo Maldonado

3.5K views · Nov 07, 2024 · 14:59 min · Watch on YouTube ↗
Takeaway

Mature AI products require evals owned jointly by PMs and engineers, run in CI, with tracing across multi-tool agent flows — Zapier's 300% accuracy gain proves it.

Summary

  • Zapier runs 10M+ AI tasks/day across products like Zap Builder, Copilot, Central; Braintrust co-developed with Zapier as first user.
  • Built eval suite went from 7 manually-run unit tests to 800+ CI-run evals with synthetic data from corporate account; PMs co-own eval criteria (triggers correct, top-25 apps supported, paths/filters working).
  • Custom graders combine logic-based and LLM-as-judge; Braintrust dashboard diffs runs (pink/green) to catch regressions across providers.
  • Improved Zap Builder accuracy by nearly 300% via this eval-driven loop; not at 100% but well above baseline.
  • Copilot's agent-with-tools framework needs tracing (Braintrust) for fine-grained observability of critical paths; A/B'd GPT-3.5 turbo vs newer models in playground.
evalsbraintrustzapier
Original description
Zapier is the #1 workflow automation platform for small and midsize businesses, connecting to more than 7,000 of the most popular work apps. We were also one of the first companies to build and ship AI features into our core products. We've had the opportunity to work with Braintrust since the early days of the product, which now powers the evaluation and observability infrastructure across our AI features.

We’ll walk through a couple of our projects – AI Zap Builder and Zapier Copilot – and how we built them from the ground up with evals and observability in mind. Hopefully, you can walk away with a few of our learnings from operating these projects at scale while systematically improving their performance.

Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Ankur
Ankur Goyal is the founder & CEO of Braintrust—the developer platform that companies like Zapier, Notion, Instacart, Airtable, and more use to evaluate, log, and ship reliable AI products to millions. He was previously Head of AI platform at Figma, founder and CEO of Impira, and VP Eng at Singlestore. After Figma acquired Impira, he led the AI team there, and saw a number of the same blockers to AI development at Impira, Figma, and other peer companies, which led to founding Braintrust

About Olmo
Olmo Maldonado is a Senior AI Engineer at Zapier, where he develops high-scale AI services aimed at democratizing automation. He leads the team behind the AI Zap Builder and Copilot, and has created shared services that have enabled multiple teams to scale LLM usage to production loads. Under his leadership, Zapier improved AI accuracy by fostering a culture of Eval Driven Development. Olmo is a former Googler and a core developer of MooTools. He holds a Master’s degree in Electrical Engineering from the University of California, Los Angeles. Born in Mexico and raised in the United States, Olmo embraces and celebrates both cultures. He resides in San Antonio, TX, with his family.