← all topics

🏗️ AI Infrastructure

GPU clusters, training stacks, autoscaling inference, data pipelines, feature stores, observability for AI workloads.

36 videos · agentsinfrastructuregpusai-infrastructuredurable-executioncoding-agents

The workflow

flowchart LR
    A[Workload] --> B{Stage}
    B -->|Train| C[Multi-GPU cluster<br/>NVLink / RDMA]
    B -->|Serve| D[Autoscaling fleet<br/>spot + reserved]
    B -->|Data| E[Lakehouse +<br/>feature store]
    C --> F[Observability<br/>traces + metrics]
    D --> F
    E --> F
    F --> G[Cost / perf<br/>tuning loop]

Training is bursty; serving is steady; the bill is paid by serving.

Key takeaways

Export controls have not stopped Chinese compute scale-up, and Middle East gigawatt-scale builds are reshaping where frontier training happens.
Agent-driven development needs a continuous-compute substrate that replaces PR-centric CI/CD with high-throughput, machine-paced merging.
Enterprise AI data centers need rail-optimized, isolated backend GPU networks tuned for job-completion time, sized differently for training vs inference.
Applied Compute scales enterprise RL by trading off policy staleness against throughput in asynchronous pipeline RL — the efficient frontier between speed and learning stability.
Run AI-agent-generated code inside lightweight microVMs with snapshot/restore rather than containers — speed and isolation both matter at agent scale.
Build a centralized AI platform (models, vectors, connectors, observability, eval monitoring) so feature teams don't each reinvent rag-ops and governance.

Videos (36)

The Geopolitics of AI Infrastructure - Dylan Patel, SemiAnalysis

Export controls have not stopped Chinese compute scale-up, and Middle East gigawatt-scale builds are reshaping where frontier training happens.

27.6K views · Jun 19, 2025

CI/CD Is Dead, Agents Need Continuous Compute and Computers — Hugo Santos and Madison Faulkner

Agent-driven development needs a continuous-compute substrate that replaces PR-centric CI/CD with high-throughput, machine-paced merging.

17.5K views · May 13, 2026

How to Build Your Own AI Data Center in 2025 — Paul Gilbert, Arista Networks

Enterprise AI data centers need rail-optimized, isolated backend GPU networks tuned for job-completion time, sized differently for training vs inference.

15.6K views · Apr 27, 2025

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

Applied Compute scales enterprise RL by trading off policy staleness against throughput in asynchronous pipeline RL — the efficient frontier between speed and learning stability.

10.8K views · Dec 09, 2025

Arrakis: How To Build An AI Sandbox From Scratch - Abhishek Bhardwaj, OpenAI

Run AI-agent-generated code inside lightweight microVMs with snapshot/restore rather than containers — speed and isolation both matter at agent scale.

8.5K views · Jun 03, 2025

AI Platform Engineering: Patrick Debois

Build a centralized AI platform (models, vectors, connectors, observability, eval monitoring) so feature teams don't each reinvent rag-ops and governance.

8.5K views · Dec 31, 2024

Rishabh Garg, Tesla Optimus — Challenges in High Performance Robotics Systems

High-performance robotics depends on pipelined, synchronized comms threads — and most 'bad policy' bugs are actually CAN/thread timing bugs you can only see with bus-level logging.

8.0K views · Aug 25, 2025

Productionizing GenAI Models – Lessons from the world's best AI teams: Lukas Biewald

In GenAI, the learnings (not the code) are your IP — track every experiment automatically so iteration time, not feature velocity, becomes your competitive edge.

7.4K views · Oct 23, 2024

We accidentally made an AI platform: Jamie Turner

A reactive backend platform turned out to be the right substrate for shipping AI apps with confidence, with vector indexes and component libraries as natural extensions.

7.0K views · Oct 08, 2024

Two Roads to Durable Agents: Replay vs. Snapshot — Eric Allam, CEO, Trigger.dev

Durable agents need either replay-style journaling (Temporal) or snapshot-style state capture; replay's determinism constraints make it awkward for LLM-driven workflows.

6.8K views · May 10, 2026

Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner

Framework fragmentation is the bottleneck for production GenAI; MAX targets one runtime spanning CPU and GPU to bridge research and prod.

6.8K views · Jul 25, 2024

Building LinkedIn's GenAI Platform — Xiaofeng Wang

LinkedIn's GenAI platform evolved through three generations to support multi-agent systems with distributed orchestration, a skill registry, layered memory, and OpenTelemetry-based observability.

6.2K views · Apr 16, 2025

AI Kernel Generation: What's working, what's not, what's next – Natalie Serrino, Gimlet Labs

AI agents can already deliver double-digit kernel speedups on real workloads by iterating compile-run-profile loops, but struggle on the most complex kernels — promising for cross-hardware porting.

5.3K views · Dec 17, 2025

GPU-less, Trust-less, Limit-less: Reimagining the Confidential AI Cloud - Mike Bursell

TEE-based confidential AI plus a decentralized marketplace (Super Protocol) enables training and inference on sensitive data without trusting any cloud provider.

4.8K views · Jun 03, 2025

Why, and how you need to sandbox AI-Generated Code? — Harshil Agrawal, Cloudflare

Always sandbox AI-generated code with capability-based security — V8 isolates for fast lightweight execution, containers when you need a full Linux environment.

4.2K views · Apr 08, 2026

Production software keeps breaking and it will only get worse — Anish Agarwal, Traversal.ai

Autonomous incident debugging requires fusing causal ML, semantics and custom agent control flow — neither AIOps, plain LLMs nor ReAct agents alone can do it.

3.9K views · Jul 10, 2025

Compilers in the Age of LLMs — Yusuf Olokoba, Muna

A Python compiler with LLM-assisted verification can turn AI inference code into portable native binaries that run anywhere, sidestepping container-based deployment.

3.9K views · Nov 24, 2025

Why We Don't Need More Data Centers - Dr. Jasper Zhang, Hyperbolic

Distributed GPU marketplaces are a faster, cheaper way to satisfy AI compute demand than waiting 7+ years for new hyperscale data centers.

3.6K views · Aug 01, 2025

Scaling AI Agents Without Breaking Reliability — Preeti Somal, Temporal

Temporal's decade-old durable workflow engine maps cleanly onto agent reliability needs and is now seeing Python SDK overtake others as agents go to production.

3.3K views · Jul 28, 2025

AX is the only Experience that Matters - Ivan Burazin, Daytona

Build tools for agents (Agent Experience) not for humans-with-AI — speed, API-first, agent-readable docs, and autonomy-by-default are the bar.

3.2K views · Jul 24, 2025

The Hierarchy of Needs for Training Dataset Development: Chang She and Noah Shpak

Training-data infra is the bottleneck; Lance format + materialization service enables Character.AI's iteration speed by satisfying filter+shuffle+blob-stream simultaneously.

3.0K views · Oct 15, 2024

Platforms for Humans and Machines: Engineering for the Age of Agents — Juan Herreros Elorza

To make coding agents productive in enterprise, redesign internal platforms around self-service APIs with shift-left feedback so agents (and humans) can iterate without humans-in-the-loop.

2.5K views · Apr 08, 2026

OpenLLMetry is all you need

OpenLLMetry brings the OpenTelemetry standard and ecosystem to LLM apps so you get vendor-neutral, drop-in observability across providers, vector DBs and agent frameworks.

2.2K views · Feb 22, 2025

Infrastructure for the Singularity — Jesse Han, Morph

Future agentic infra needs sub-second VM snapshot/branch/replicate primitives so agents can fork environments faster than humans can deploy them.

2.1K views · Aug 01, 2025

Context Platform Engineering to Reduce Token Anxiety — Val Bercovici, WEKA

Treat agent context as a tiered storage problem and maximize KV-cache hit rate at the platform layer — that beats prompt-cache arbitrage and most other inference optimizations.

1.6K views · Nov 24, 2025

Keynote: The AI developer experience doesn't have to suck – why and how we built Modal

Sub-second container start + Python-native serverless makes Modal feel like local iteration while scaling to thousands of GPUs.

1.5K views · Feb 22, 2025

Conquering Agent Chaos — Rick Blalock, Agentuity

Agents need agent-specific infra — long runtime, stateful routing, framework-agnostic deployment — not stateless serverless.

1.3K views · Jul 01, 2025

Infra that fixes itself, thanks to coding agents — Mahmoud Abdelwahab, Railway

Combine durable workflows with a headless coding agent (OpenCode) so infrastructure issues become reviewable pull requests instead of pages.

973 views · Nov 24, 2025

Accelerating Mixture of Experts Training With Rail Optimized InfiniBand Networking in Crusoe Cloud

Rail-optimized InfiniBand on green-powered Crusoe Cloud cuts the all-reduce communication penalty that otherwise idles GPUs for 25-30% of MoE training time.

855 views · Feb 12, 2025

Luminal - Search-Based Deep Learning Compilers - Joe Fioti

Reduce deep learning to ~12 primitives and let search-based compilers generate the fast code — a path to vastly simpler ML stacks that still hit peak hardware performance.

738 views · Jun 03, 2025

How agents broke app-level infrastructure - Evan Boyle

Agentic workloads break Web 2.0 infrastructure assumptions about latency and reliability; we need durable execution layers built for seconds-to-hours requests.

577 views · Jun 03, 2025

Building Agentic Applications w/ Heroku Managed Inference and Agents — Julián Duque & Anush Dsouza

Heroku now ships managed inference + MCP-based agents + pgvector so apps can attach AI and tools the same way they attach Postgres.

536 views · Jun 27, 2025

Substrate Launch: the API for modular AI

Substrate runs multi-model computation graphs as a coordinated cluster, replacing many slow API calls with microsecond inter-node hops and reliable structured outputs.

500 views · Feb 06, 2025

Continuous Profiling for GPUs — Matthias Loibl, Polar Signals

Always-on sampled profiling with eBPF + NVML + GPU time attribution finally gives the GPU equivalent of CPU flame charts in production.

360 views · Jul 22, 2025

Building agent fleet architectures your CISO doesn't hate — Lou Bichard, Gitpod

For regulated buyers, the right agent-fleet architecture is a substrate: customer owns the workload + source code on their cloud, vendor manages the control plane via minimal telemetry — not pure SaaS or pure self-hosted.

307 views · Jun 27, 2025

Accelerate your AI journey with Azure AI model catalog: Sharmila Chokalingam

Azure positions itself as a unified catalog + serving layer letting enterprises prototype, optimize, and operationalize across 1,600+ models with consistent APIs and data privacy.

209 views · Feb 06, 2025