Prompt Engineering is Dead — Nir Gazit, Traceloop

24.9K views · Jun 27, 2025 · 14:19 min · Watch on YouTube ↗

Takeaway

Replace manual prompt fiddling with an evaluator-in-the-loop optimizer agent that rewrites prompts until the judge score improves.

Summary

Argues prompt engineering isn't engineering — replaces hand-tuning with an automatic loop: dataset + LLM-as-judge evaluator + researcher agent that rewrites prompts.
Built a 20-question fact-based evaluator (3 facts each = 60 boolean checks) against a Chroma+OpenAI RAG over Traceloop docs.
An agent crawls online prompt guides, takes the failure reasons from the judge, and produces a new prompt — iterating like classic ML training.
Approach yielded a 5x quality improvement on their sample chatbot with no manual prompt tweaking.
Frames the work as evaluator-driven optimization where the human only writes the dataset and metric, not the prompt itself.

prompt-optimizationllm-as-judgerag

Original description

Manual prompt crafting doesn't scale. In this session, we'll explore how to replace it with a test-driven, automated approach. You'll see how to define output evaluators, write minimal prompts, and let agents iterate toward optimal performance—all without manual tweaking. If you're still hand-tuning prompts, you're doing it wrong.

About Nir Gazit
CEO @ traceloop; ex-chief architect @ Fiverr, ex-tech lead @ Google; OpenTelemetry contributor

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter