HybridRAG: A Fusion of Graph and Vector Retrieval - Mitesh Patel, NVIDIA

20.0K views · Jul 22, 2025 · 20:24 min · Watch on YouTube ↗

Takeaway

Fusing LLM-built knowledge graphs with vector search yields richer multi-hop retrieval than either alone on enterprise documents.

Summary

NVIDIA's HybridRAG combines knowledge-graph triplet retrieval with semantic vector search; LLMs extract entity-relation-entity triples from unstructured docs (e.g. Exxon quarterly report).
Architecture has four components: data, data processing, parallel graph + vector index creation, and inference; splits into offline indexing vs online querying.
Graph retrieval captures multi-hop relationships that pure semantic search misses; vector search handles fuzzy lexical similarity.
Demonstrates the approach on financial docs and provides a GitHub notebook for developers to fork.
Highlights that LLM-based triplet extraction is the hard part — quality of the graph determines retrieval quality downstream.

graph-ragknowledge-graphnvidia

Original description

Interpreting complex information from unstructured text data poses significant challenges to Large Language Models (LLM), with difficulties often arising from specialized terminology and the multifaceted relationships between entities in document architectures. Conventional Retrieval Augmented Generation (RAG) methods face limitations in capturing these nuanced interactions, leading to suboptimal performance. In our talk, we introduce a novel approach integrating Knowledge Graph-based RAG (GraphRAG) with VectorRAG, designed to refine question-answering (Q&A) systems for more effective information extraction from complex texts. Our approach employs a dual retrieval strategy that harnesses both knowledge graphs and vector databases, enabling the generation of precise and contextually appropriate answers, thereby setting a new standard for LLMs in processing sophisticated data.

About Mitesh Patel
Mitesh Patel is a developer advocate manager at NVIDIA. His team is responsible for creating workflows to showcase how developers can harness GPU acceleration in their workflows using tools and frameworks popular in the developer community. Before NVIDIA, he was a senior research scientist at Fuji Xerox Palo Alto Laboratory Inc. (a research subsidiary of Fuji Xerox), where he worked on developing indoor localization technologies for applications such as asset tracking in hospitals and delivery cart tracking in manufacturing facilities. Mitesh received his Ph.D. in Robotics from the Center of Autonomous Systems (CAS) at the University of Technology Sydney, Australia in 2014.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter