← back

This video was edited with AI agent. But how?

4.2K views · Feb 22, 2025 · 5:00 min · Watch on YouTube ↗
Takeaway

Code-generating agents plus a browser-renderable JS video library make programmatic video editing feasible end-to-end.

Summary

  • Reskill built an open-source video-editing agent on top of Diffusion Studio's core library, which exposes JS/TS programmatic video composition.
  • Architecture: agent uses Playwright to drive an Operator UI rendering video in-browser via WebCodecs; tools are video-editing (code-gen), doc-search (RAG), and visual feedback.
  • Visual feedback acts as discriminator: composition is sampled at 1 fps and reviewed before rendering proceeds, GAN-style.
  • Ships llms.txt for agent guidance and supports remote GPU-accelerated browser sessions behind a load balancer.
  • Python first; TypeScript port underway following 'any app that can be written in TypeScript will be'.
video-agentdiffusion-studioplaywright
Original description
The talk is about world’s first open-source video editing agent!

Diffusion Studio x Re-Skill technology proposal:

Our Python-based agent starts a browser session using Playwright and opens operator.diffusion.studio.

This web app is a video editing UI optimized for agents, providing access to Diffusion Studio Core—a JavaScript-based engine that renders videos directly in the browser using WebCodecs (fully hardware-accelerated).

🖥 How it works:
1️⃣ A VideoEditingTool generates code based on user prompts and runs it in the browser.
2️⃣ If additional context is needed, DocsSearchTool uses RAG to pull information from operator.diffusion.studio/llms.txt.
3️⃣ After each execution step, the composition is sampled (currently 1 frame per second) and analyzed using VisualFeedbackTool via a multi-modal model.
4️⃣ The feedback system decides whether to proceed with rendering or refine further.

📡 File transfers between the browser and Python happen via Chrome DevTools Protocol, and for scalability, the agent can connect to a GPU-accelerated remote browser session via WebSocket (WIP: wss://chrome.diffusion.studio).

---

https://github.com/diffusionstudio/agent

https://re-skill.io/

slides: https://docs.google.com/presentation/d/1eipINYiwx3vjwvJXrv4QA0-9t4-uIVPuh112X9pElkM/edit?usp=sharing