← back
Prompt Engineering is Dead — Nir Gazit, Traceloop
Takeaway
Replace manual prompt fiddling with an evaluator-in-the-loop optimizer agent that rewrites prompts until the judge score improves.
Summary
- Argues prompt engineering isn't engineering — replaces hand-tuning with an automatic loop: dataset + LLM-as-judge evaluator + researcher agent that rewrites prompts.
- Built a 20-question fact-based evaluator (3 facts each = 60 boolean checks) against a Chroma+OpenAI RAG over Traceloop docs.
- An agent crawls online prompt guides, takes the failure reasons from the judge, and produces a new prompt — iterating like classic ML training.
- Approach yielded a 5x quality improvement on their sample chatbot with no manual prompt tweaking.
- Frames the work as evaluator-driven optimization where the human only writes the dataset and metric, not the prompt itself.
prompt-optimizationllm-as-judgerag
Original description
Manual prompt crafting doesn't scale. In this session, we'll explore how to replace it with a test-driven, automated approach. You'll see how to define output evaluators, write minimal prompts, and let agents iterate toward optimal performance—all without manual tweaking. If you're still hand-tuning prompts, you're doing it wrong. About Nir Gazit CEO @ traceloop; ex-chief architect @ Fiverr, ex-tech lead @ Google; OpenTelemetry contributor Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter