← back
How Intuit uses LLMs to explain taxes to millions of taxpayers - Jaspreet Singh, Intuit
Takeaway
At Intuit scale, regulated-domain LLM apps rely on multi-model GenOS, RAG/GraphRAG, fine-tuning, and tax-analyst-driven evals to stay accurate and upgradeable.
Summary
- Intuit's GenOS platform powers TurboTax explanations across 44M+ tax returns; Claude is the production model for static prompts (multi-million-dollar contract), GPT-4 mini for dynamic Q&A.
- RAG and GraphRAG on proprietary tax-info plus IRS form changes; piloted fine-tuning Claude 3 Haiku on Bedrock to shrink prompts and cut latency.
- Tax analysts (domain experts) double as prompt engineers; ML/data science focuses on metrics and golden datasets via AWS Ground Truth.
- Phased eval: manual evaluations early, LLM-as-judge for automated production sampling, monitoring accuracy/relevancy/coherence.
- Lessons: vendor lock-in via contracts and prompts is real; same-vendor model upgrades (Claude Instant -> Haiku) are non-trivial; LLM latency is 3-10s, not 100ms.
llm-appsfinetuningevals
Original description
I will talk about how Intuit uses LLMs to explain tax situations to Turbotax users. Users want explanations of their tax situations - this drives confidence in the product. Over the course of last two tax years, Intuit has built out explanations using Anthropic and openAI’s models to develop genAI powered explanations. This includes design a complex system with prompt engineered solutions and both LLM & human powered evaluations to ensure high quality bar that our users expect when filing taxes with us. During the course of my talk, I will talk across GenAI development lifecycle at scale - including development , evaluations and scaling. And security evaluations. We also developed a fine-tuned version of Claude Haiku & shall be covering that in the presentation. We also expanded into tax question and answering powered by RAG, including graphRAG and I would be covering those developments too. About Jaspreet Singh I’m Jaspreet Singh, a Senior Staff Software Engineer with 12 years of experience in the tech industry. I am the tech lead for the Smart Turbotax AI team at Intuit - focusing on development of new GenAI powered experiences in Intuit Turbotax. I have worked extensively on Personalization and Recommendations problems in the past and I’m very passionate about bringing the latest in AI to help drive Taxes are done experiences for our users. I recently became a father for the first time, and enjoy spending time with my little one. As a speaker at the AI Engineer World’s Fair, I’m excited to share our journey of transforming our user’s tax filing journeys with the power of Gen AI.. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter