Continuous Profiling for GPUs — Matthias Loibl, Polar Signals

360 views · Jul 22, 2025 · 11:31 min · Watch on YouTube ↗

Takeaway

Always-on sampled profiling with eBPF + NVML + GPU time attribution finally gives the GPU equivalent of CPU flame charts in production.

Summary

Polar Signals (Parca maintainers) extends always-on sampled profiling from CPU/memory to GPUs using eBPF and NVIDIA NVML.
Collects GPU utilization per process and per node, plus memory utilization, clock speed, power vs. power-limit, temperature (relevant for thermal throttling), and PCIe RX/TX throughput.
Correlates GPU-underutilization windows with CPU flame charts to spot data-loading bottlenecks between CPU and GPU.
New GPU time profiling (just-announced) measures actual on-GPU duration of each CUDA function — flame charts now show CPU stacks with GPU-time leaves attributed to libcuda calls.
Works with Python, Rust, JVM, Ruby; deploys via Kubernetes DaemonSet; Turbopuffer uses it for vector-engine performance work.

gpuprofilingebpf

Original description

Continuous Profiling for GPUs extends our industry-leading continuous profiling platform to provide deep, always-on visibility into your GPU workloads.

Now you can see exactly how your GPUs are being utilized millisecond by millisecond. Our solution helps you move from guesswork to data-driven optimization.


---related links---

https://twitter.com/metalmatze
https://www.linkedin.com/in/metalmatze/
https://matthiasloibl.com/
https://polarsignals.com