← back
Shipping complex AI applications — Braintrust & Trainline
Takeaway
Shipping production AI agents requires the same eval-and-observability discipline Trainline applies via Braintrust to keep agentic ticketing reliable.
Summary
- Hands-on workshop in London with Braintrust and Trainline (UK rail ticketing app) on shipping production-quality AI applications.
- Trainline's AI engineers describe their move from BERT-era LLMs to state-of-the-art agentic products for ticketing and travel.
- Workshop walks attendees through Braintrust's eval and observability tooling for managing agent quality at scale, with cheat sheets and Slack support.
- Emphasizes that quality bars and continuous evaluation — not just shipping — are what separate prototype agents from production-grade ones.
evalsbraintrustagents
Original description
Getting a prototype working is straightforward. Making it reliable in production, especially with multi-step agents, tool use, and real users is the hard part. In this hands-on workshop, you'll work through the core parts of building production-grade AI applications with Giran Moodley, Mayank Soni, and Oussama Hafferssas. Socials: - https://uk.linkedin.com/in/mayank-soni - https://x.com/OussamaHaff - https://www.linkedin.com/in/giran/ Timestamps 0:00 - Introduction and Welcome 4:07 - Workshop Overview and Agenda 4:39 - Understanding AI Engineering and Operational Challenges 9:55 - Introduction to Braintrust 12:56 - Experience from Trainline 28:35 - Building the Support Triage Agent (Overview) 33:57 - Basic Implementation: Single Shot Prompting 40:32 - Adding Local Tools for Determinism 41:30 - Implementing Specialist Stages (Agentic Flow) 46:19 - Instrumenting and Tracing the Application 56:43 - Evaluating AI Systems and Golden Data Sets 1:05:07 - Deploying and Managing AI in Production 1:13:58 - Online Scoring and Monitoring Production Logs 1:19:13 - Identifying and Remediating Failure Modes 1:33:05 - Key Takeaways and Summary 1:36:58 - Further Resources and Documentation