← back

Perceptual Evaluations: Evals for Aesthetics — Diego Rodriguez, Krea.ai

Original: Perceptual Evaluations: Evals for Aesthetics — Diego Rodriguez, Krea.ai

2.9K views · Aug 23, 2025 · 16:27 min · Watch on YouTube ↗
Takeaway

Aesthetic evals must center human perception — current metrics like FID and CLIP miss what people actually find broken in generative imagery.

Summary

  • Diego (Krea.ai) shows o3 spending 17s of tool-calling Python/OpenCV failing to label an obviously broken AI-generated hand image — current models can't do basic aesthetic perception.
  • JPEG/MP3/MP4 work by exploiting human perception (brightness > color, etc.) — but our training data is full of such lossy compressed media, contaminating model 'aesthetics'.
  • FID scores penalize JPEG artifacts heavily even though images look identical to humans — using FID to grade generative models is misaligned with perception.
  • Evals focus on what's easy to measure (CLIP prompt adherence, object counts) and miss perceptual coherence ('the clock doesn't look right, that sky makes no sense').
  • Quotes friend at Midjourney: predicting the car was easy, predicting traffic was hard — what 'traffic' are AI engineers missing now? Possibly perceptual evaluation.
evalsaestheticsgenerative-media
Original description
Special session with KREA.ai's cofounder Diego Rodriguez on how evals for aesthetics and image/generative media work — the hardest kinds of evals.

  linkedin.com/in/asciidiego/

Timestamps
00:15 Introduction to Perceptual Evaluations
00:50 The Problem with Current AI Evaluations
02:16 Historical Context and Compression
05:14 Limitations in AI and Human-centric Metrics
08:00 Rethinking Evaluation and the Future of AI
12:44 Evaluating Our Evaluations
13:32 Krea's Role and Call to Action