← back
RAG at scale: production ready GenAI apps with Azure AI Search
Takeaway
Production RAG breaks on different axes than prototypes — hybrid retrieval, scale tiers, and semantic ranking in Azure AI Search are aimed at closing that gap.
Summary
- Pablo from Microsoft's Azure AI Search team walks through scaling the retrieval half of RAG once prototype apps hit production load.
- Frames three options to bring domain knowledge to LLMs — prompt engineering, fine-tuning, RAG — and explains why RAG dominates for factual freshness.
- Covers pressure points moving from 2023 prototypes to 2024 production: latency, concurrency, data volume, and freshness across many simultaneous users.
- Describes Azure AI Search features (hybrid vector + keyword retrieval, semantic ranking, scaling tiers) designed to absorb these production pressures.
ragazurevector-search
Original description
If 2023 was the year of GenAI prototypes, 2024 is the year these apps go into production. In this session we’ll demo how Azure AI Search combines the best RAG capabilities for GenAI apps at any scale, without compromising cost or performance. We will share the latest product updates, including increased storage limits, advanced capabilities for detailed retrieval pipeline control, built-in support for content beyond text, and re-ranking improvements to boost quality of responses. Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025 About Pablo Pablo is a Distinguished Engineer at Microsoft. He works in the Azure AI group where he is both a hands-on engineer and the general manager of Azure AI Search, a state-of-the-art information retrieval system. Pablo currently focuses on knowledge retrieval for generative AI applications and is deeply curious about what LLMs are teaching us about the nature of thought and knowledge.