Rishabh Garg, Tesla Optimus — Challenges in High Performance Robotics Systems

Original: Rishabh Garg, Tesla Optimus — Challenges in High Performance Robotics Systems

8.0K views · Aug 25, 2025 · 12:43 min · Watch on YouTube ↗

Takeaway

High-performance robotics depends on pipelined, synchronized comms threads — and most 'bad policy' bugs are actually CAN/thread timing bugs you can only see with bus-level logging.

Summary

Tesla Optimus engineer walks through what happens between controller and wire: when a robot policy misbehaves, the culprit is often the software/comms stack, not the policy.
Toy example with CAN bus at 1 Mbps: 10 messages per loop saturate the bus and create a ~1ms gap, forcing pipelined RX/policy/TX threads to hit 2ms loop times.
Multi-thread desync produces queued/skipped messages — cycle-time plots from can_dump reveal 4ms gaps followed by zero-gap bursts that manifest as motor jitter and 'catch-up' motion.
Fixes: synchronization primitives (mutex, cond var, semaphore) on RTOS-capable systems; padding/cushion buffers where they aren't available.
Diagnostic discipline: cheap external CAN transceivers into a laptop running can_dump are essential for bus-level timing analysis.

roboticsreal-timesystems

Original description

A robot's behavior is influenced by the control policy, the software configuration, and electrical characteristics of the communication protocol.

When unexpected behaviors arise, it is not straightforward to root cause them to the RL policy, electrical characteristics, mechanical characteristics. This talk walks through some of these issues and explains what might cause the observed behavior.

We will talk about concrete issues that audience will be able to take away from and develop their understanding of physical systems. It will build intuition for what kind of issues to expect when communication data rates increase manifold.

Timestamps
00:00 Introduction to high-performance robotics challenges
00:15 The problem of unexplained robot behavior
00:54 Root cause analysis: policy vs. software
01:17 Designing a toy robotics system for analysis
01:24 System architecture: sensors, CPU, GPU, actuators, CAN bus
01:57 The initial, simple code loop
02:14 Expectation vs. reality: unexpected loop execution gaps
02:42 The impact of CAN bus data rate on loop execution
03:13 Potential solutions: accepting delay vs. multithreading
04:00 A new, pipelined design for reduced cycle time
04:32 New problems: "stuttering" and abnormal motor behavior
04:49 Data collection with external transceivers and "candump"
05:24 Expected vs. actual message plots: missed messages and jitter
06:12 Using cycle time plots to identify desynchronization
06:58 Transmit phase desynchronization: missed and queued data
08:03 Receive phase desynchronization: stale data and overcompensation
08:38 Resolving synchronization issues: kernel primitives and padding
09:25 The impact of logging on system performance
11:09 Reception and priority inversion
12:02 Conclusion and summary of key takeaways

Rishabh Garg
Robotics Engineer at Tesla Optimus

I am Rishabh Garg, a robotics software engineer pushing the boundaries of software hardware integration to meet the ever increasing demand for data. I have been working with robots and embedded systems for the past 4 years, making systems more reliable and performant at companies like Tesla and Amazon. Eager to learn what experts in the industry are doing differently and share my own experience and insights into the challenges frequently encountered at the system software level for robotics.