$1 AI Guardrails: The Unreasonable Effectiveness of Finetuned ModernBERTs – Diego Carpentero

6.3K views · Apr 16, 2026 · 43:52 min · Watch on YouTube ↗

Takeaway

A fine-tuned ModernBERT classifier sidecar provides low-latency, sub-dollar guardrails against today's broad LLM attack surface across prompt, context, RAG, and MCP vectors.

Summary

Diego Carpentero builds a self-hosted defensive layer for LLM apps for under $1 by fine-tuning ModernBERT — leverages alternating local/global attention, RoPE, and FlashAttention.
Surveys six attack vectors: direct prompt injection (Sydney case), indirect/context (poisoned Wikipedia and ad-review sites), LLM internals (GCG gibberish suffixes transferable to closed models), RAG (5 poisoned chunks out of 8M docs suffice), MCP description hidden instructions, and identity workflows.
Notes LLMs have no native separation of concerns between system prompt and user data, making fine-tuned classifier guardrails effective as a sidecar.
Cites Poison-RAG paper finding only ~5 malicious chunks needed in an 8M-document corpus to manipulate target answers.
First documented case (March 2025) of AI advertising-review systems being overruled by prompts embedded in the data they evaluate.

safetyguardrailsmodernbert

Original description

LLM-based attacks are no longer the exception, they are the baseline. This talk maps the six most common attack vectors found in production AI systems: Prompt and Context Injection, Model Internals, RAG Poisoning, MCP Exploits, and Agentic Escalation. From there, it dives into the architecture of ModernBERT and shows how to fine-tune it into a lightweight, self-hosted guardrails layer for under a dollar.

What you will learn:

- The Zero Trust Gap in LLMs: what these attack vectors share in common, and why model alignment and human review alone are not enough
- The secret sauce that makes encoder models beat LLM-as-a-Judge in latency and flexibility
- ModernBERT under the hood: a deep dive into Alternating Attention, Unpadding & Sequence Packing, RoPE, and FlashAttention
- Building your own safety layer: a practical walkthrough of fine-tuning ModernBERT as a safety discriminator
- Live Demo: real attack prompts from each vector tested against our model"

Speaker info:
- Diego Carpentero -  AI Engineer | Tech Entrepreneur | Open Source Contributor | NVIDIA Certified Professional (NCP-GENL)

Timestamps:
00:00 Intro