RAG in 2025: State of the Art and the Road Forward — Tengyu Ma, MongoDB (acq. Voyage AI)

5.6K views · Jun 27, 2025 · 18:48 min · Watch on YouTube ↗

Takeaway

In 2025, RAG with high-quality domain-specific embeddings, rerankers, and contextual enrichment remains the most cost-effective and governable way to ground LLMs in proprietary data.

Summary

Tengyu Ma (Voyage AI, recently acquired by MongoDB) argues RAG beats fine-tuning and long-context for enterprise: long context = scanning whole library every query; fine-tuning = forced muscle memorization with poor governance.
Voyage embeddings average ~80% accuracy across 100 datasets (some 90-95%, others 60); Matryoshka embeddings + quantization deliver 10-100x storage savings with 1-2% loss.
Practical improvements: hybrid search + rerankers, query decomposition, document enrichment with titles/categories/dates (Anthropic-style contextual chunks via LLM).
Voyage's domain-specific embeddings push the storage-vs-accuracy Pareto frontier beyond OpenAI.

ragembeddingsvoyage-ai

Original description

The talk will have three parts
1.Roadmap debate: RAG vs. finetuning vs. long-context
2.RAG today: benefits, challenges, and current solutions
3.RAG tomorrow: AI models do more work

About Tengyu Ma
Tengyu Ma is the Chief AI Scientist @ MongoDB and an Assistant Professor @ Stanford. He was the co-founder and CEO of Voyage AI before the acquisition by MongoDB.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter