← back
Information Retrieval from the Ground Up - Philipp Krenn, Elastic
Takeaway
Solid lexical analysis (tokenization, stemming, offsets) plus hybrid vector retrieval beats pure vector search for production RAG.
Summary
- Hands-on Elasticsearch workshop on the R in RAG — keyword/lexical search, vector search, hybrid retrieval.
- Walks tokenization with offsets and positions: keep offsets for highlighting, positions for phrase queries (n, n+1).
- Custom analyzers: strip HTML, standard tokenizer, lowercase, remove stop words, Snowball stemmer — reduces 'These are not the droids you're looking for' to {droid, you, look}.
- Argues vector search is just one feature of retrieval — hybrid (lexical + dense) wins for production systems.
searchelasticsearchhybrid-retrieval
Original description
Vector search is only a feature. Search engines and information retrieval have retaken their position as the foundation of RAG. This workshop takes you through decades of research, what has been working for a long time, and how it got better with Machine Learning. About Philipp Krenn Philipp leads Developer Relations at Elastic — the company behind the Elasticsearch, Kibana, Beats, and Logstash. Based in San Francisco, he lives to demo interesting technology and solve challenging problems — all with a smile and a terminal window. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter