← back

From Chaos to Choreography: Multi-Agent Orchestration Patterns That Actually Work — Sandipan Bhaumik

35.6K views · Apr 08, 2026 · 26:28 min · Watch on YouTube ↗
Takeaway

Treat multi-agent AI as distributed systems engineering and pick choreography vs. orchestration based on workflow complexity and autonomy needs.

Summary

  • Multi-agent systems are distributed systems: going from 1 to 5 agents is 25x more complex (10 coordination edges), not 5x.
  • War story: a credit-decisioning system gave 20% wrong risk ratings because a stale cache between agents wasn't invalidated after a Postgres write — a classic race condition, not a model failure.
  • Choreography (event-bus, decentralized) scales and adds agents easily but is a debugging nightmare without bulletproof observability.
  • Orchestration (central coordinator like LangGraph DAGs on Databricks) wins in regulated domains where rollback and audit trail matter more than autonomy.
  • Decision framework: simple+autonomous → choreography; complex+controlled → orchestration; complex+autonomous → hybrid with saga patterns.
multi-agentorchestrationdistributed-systems
Original description
One AI agent is a feature. Fifty agents is a distributed systems problem nobody's discussing. I've seen this pattern: teams build one agent, then five, then drown in coordination problems unrelated to LLMs. Agent handoffs fail silently. Data goes stale. Decisions become untraceable. Drawing from Databricks production deployments, I'll expose orchestration anti-patterns killing multi-agent systems and show agent handoff protocols that work—state management, data contracts, failure modes. You'll see when to choreograph versus orchestrate and live multi-agent workflow with proper observability. This applies distributed systems engineering to agents: the infrastructure layer everyone needs but nobody's building.

Sandipan Bhaumik - Data & AI Tech Lead, Databricks

Sandipan Bhaumik has spent 18 years building data and AI systems inside environments that can't afford them to fail - NHS, Tier 1 banks, and large enterprises across EMEA. At AWS and now Databricks, he's seen firsthand where multi-agent systems break down between architecture and production. He is a regular speaker on data and AI system architecutr ebest practices, runs a community of AI practitioners, and he's here to talk about what actually holds together when you scale agentic AI systems in production.

Socials:
https://www.linkedin.com/in/sandipanbhaumik

Slides:
https://drive.google.com/file/d/18LqVzhfVS3iULYuy2EshWoMLmQt3rdpT/view?usp=sharing