← back

$1 AI Guardrails: The Unreasonable Effectiveness of Finetuned ModernBERTs – Diego Carpentero

6.3K views · Apr 16, 2026 · 43:52 min · Watch on YouTube ↗
Takeaway

A fine-tuned ModernBERT classifier sidecar provides low-latency, sub-dollar guardrails against today's broad LLM attack surface across prompt, context, RAG, and MCP vectors.

Summary

  • Diego Carpentero builds a self-hosted defensive layer for LLM apps for under $1 by fine-tuning ModernBERT — leverages alternating local/global attention, RoPE, and FlashAttention.
  • Surveys six attack vectors: direct prompt injection (Sydney case), indirect/context (poisoned Wikipedia and ad-review sites), LLM internals (GCG gibberish suffixes transferable to closed models), RAG (5 poisoned chunks out of 8M docs suffice), MCP description hidden instructions, and identity workflows.
  • Notes LLMs have no native separation of concerns between system prompt and user data, making fine-tuned classifier guardrails effective as a sidecar.
  • Cites Poison-RAG paper finding only ~5 malicious chunks needed in an 8M-document corpus to manipulate target answers.
  • First documented case (March 2025) of AI advertising-review systems being overruled by prompts embedded in the data they evaluate.
safetyguardrailsmodernbert
Original description
LLM-based attacks are no longer the exception, they are the baseline. This talk maps the six most common attack vectors found in production AI systems: Prompt and Context Injection, Model Internals, RAG Poisoning, MCP Exploits, and Agentic Escalation. From there, it dives into the architecture of ModernBERT and shows how to fine-tune it into a lightweight, self-hosted guardrails layer for under a dollar.

What you will learn:

- The Zero Trust Gap in LLMs: what these attack vectors share in common, and why model alignment and human review alone are not enough
- The secret sauce that makes encoder models beat LLM-as-a-Judge in latency and flexibility
- ModernBERT under the hood: a deep dive into Alternating Attention, Unpadding & Sequence Packing, RoPE, and FlashAttention
- Building your own safety layer: a practical walkthrough of fine-tuning ModernBERT as a safety discriminator
- Live Demo: real attack prompts from each vector tested against our model"

Speaker info:
- Diego Carpentero -  AI Engineer | Tech Entrepreneur | Open Source Contributor | NVIDIA Certified Professional (NCP-GENL)

Timestamps:
00:00 Intro