FLUX, Open Research, and the Future of Visual AI — Stephen Batifol, Black Forest Labs

4.1K views · May 08, 2026 · 22:31 min · Watch on YouTube ↗

Takeaway

Black Forest Labs is pushing open-source visual AI toward 'visual intelligence' — combined generation, editing, and multi-reference composition — with Flux 2 as the latest base.

Summary

Stephen Batifol (Black Forest Labs DevRel) recaps BFL — team behind Stable Diffusion / latent diffusion / Flux, 200K+ academic citations, customers include Microsoft, Adobe, Canva, Mistral.
Flux.1 (Aug 2024): first breakthrough text-to-image runnable on a laptop, anatomy better than much larger rivals, top-liked model on Hugging Face at launch.
Flux Context: first open-source text-to-image + editing model; 7-8s generations vs ~40-50s for early GPT-image, enables character-consistent storyboards used as input frames for video models.
Flux 2 (Nov 2025): step toward 'visual intelligence' — generates near-photoreal humans/animals/product shots and accepts multiple reference images for compositional editing (e.g., 'create an outfit from these 6 images').

fluximage-generationblack-forest-labs

Original description

FLUX started as an image model story, but this talk makes the larger ambition clear: visual intelligence, not just image generation. From FLUX.1 through Kontext, FLUX.2, and FLUX.2 Klein, Black Forest Labs has been pushing fast, open releases while building toward models that understand images, video, audio, actions, and eventually the physical world itself.

Along the way, Stephen Batifol walks through the research behind that direction, including BFL's work on self-supervised multimodal training, real-time generation and editing, and the path from generative media toward world models and robotics.

Speaker info:
- https://x.com/stephenbtl
- https://www.linkedin.com/in/stephen-batifol/