← back
$1 AI Guardrails: The Unreasonable Effectiveness of Finetuned ModernBERTs – Diego Carpentero
Takeaway
A fine-tuned ModernBERT classifier sidecar provides low-latency, sub-dollar guardrails against today's broad LLM attack surface across prompt, context, RAG, and MCP vectors.
Summary
- Diego Carpentero builds a self-hosted defensive layer for LLM apps for under $1 by fine-tuning ModernBERT — leverages alternating local/global attention, RoPE, and FlashAttention.
- Surveys six attack vectors: direct prompt injection (Sydney case), indirect/context (poisoned Wikipedia and ad-review sites), LLM internals (GCG gibberish suffixes transferable to closed models), RAG (5 poisoned chunks out of 8M docs suffice), MCP description hidden instructions, and identity workflows.
- Notes LLMs have no native separation of concerns between system prompt and user data, making fine-tuned classifier guardrails effective as a sidecar.
- Cites Poison-RAG paper finding only ~5 malicious chunks needed in an 8M-document corpus to manipulate target answers.
- First documented case (March 2025) of AI advertising-review systems being overruled by prompts embedded in the data they evaluate.
safetyguardrailsmodernbert
Original description
LLM-based attacks are no longer the exception, they are the baseline. This talk maps the six most common attack vectors found in production AI systems: Prompt and Context Injection, Model Internals, RAG Poisoning, MCP Exploits, and Agentic Escalation. From there, it dives into the architecture of ModernBERT and shows how to fine-tune it into a lightweight, self-hosted guardrails layer for under a dollar. What you will learn: - The Zero Trust Gap in LLMs: what these attack vectors share in common, and why model alignment and human review alone are not enough - The secret sauce that makes encoder models beat LLM-as-a-Judge in latency and flexibility - ModernBERT under the hood: a deep dive into Alternating Attention, Unpadding & Sequence Packing, RoPE, and FlashAttention - Building your own safety layer: a practical walkthrough of fine-tuning ModernBERT as a safety discriminator - Live Demo: real attack prompts from each vector tested against our model" Speaker info: - Diego Carpentero - AI Engineer | Tech Entrepreneur | Open Source Contributor | NVIDIA Certified Professional (NCP-GENL) Timestamps: 00:00 Intro