← back
Continuous Profiling for GPUs — Matthias Loibl, Polar Signals
Takeaway
Always-on sampled profiling with eBPF + NVML + GPU time attribution finally gives the GPU equivalent of CPU flame charts in production.
Summary
- Polar Signals (Parca maintainers) extends always-on sampled profiling from CPU/memory to GPUs using eBPF and NVIDIA NVML.
- Collects GPU utilization per process and per node, plus memory utilization, clock speed, power vs. power-limit, temperature (relevant for thermal throttling), and PCIe RX/TX throughput.
- Correlates GPU-underutilization windows with CPU flame charts to spot data-loading bottlenecks between CPU and GPU.
- New GPU time profiling (just-announced) measures actual on-GPU duration of each CUDA function — flame charts now show CPU stacks with GPU-time leaves attributed to libcuda calls.
- Works with Python, Rust, JVM, Ruby; deploys via Kubernetes DaemonSet; Turbopuffer uses it for vector-engine performance work.
gpuprofilingebpf
Original description
Continuous Profiling for GPUs extends our industry-leading continuous profiling platform to provide deep, always-on visibility into your GPU workloads. Now you can see exactly how your GPUs are being utilized millisecond by millisecond. Our solution helps you move from guesswork to data-driven optimization. ---related links--- https://twitter.com/metalmatze https://www.linkedin.com/in/metalmatze/ https://matthiasloibl.com/ https://polarsignals.com