Information Retrieval from the Ground Up - Philipp Krenn, Elastic

4.9K views · Jul 27, 2025 · 108:07 min · Watch on YouTube ↗

Takeaway

Solid lexical analysis (tokenization, stemming, offsets) plus hybrid vector retrieval beats pure vector search for production RAG.

Summary

Hands-on Elasticsearch workshop on the R in RAG — keyword/lexical search, vector search, hybrid retrieval.
Walks tokenization with offsets and positions: keep offsets for highlighting, positions for phrase queries (n, n+1).
Custom analyzers: strip HTML, standard tokenizer, lowercase, remove stop words, Snowball stemmer — reduces 'These are not the droids you're looking for' to {droid, you, look}.
Argues vector search is just one feature of retrieval — hybrid (lexical + dense) wins for production systems.

searchelasticsearchhybrid-retrieval

Original description

Vector search is only a feature. Search engines and information retrieval have retaken their position as the foundation of RAG. This workshop takes you through decades of research, what has been working for a long time, and how it got better with Machine Learning.

About Philipp Krenn
Philipp leads Developer Relations at Elastic — the company behind the Elasticsearch, Kibana, Beats, and Logstash. Based in San Francisco, he lives to demo interesting technology and solve challenging problems — all with a smile and a terminal window.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter