← back
Robotics: why now? - Quan Vuong and Jost Tobias Springberg, Physical Intelligence
Takeaway
Robotics' breakthrough is VLA models trained on teleoperated dexterous data at scale, mirroring VLMs with a multi-year lag.
Summary
- Physical Intelligence (PI) aims to build a single model controlling any robot for any task; they open-source models and publish research.
- Introduces Vision-Language-Action (VLA) models: VLMs adapted to take robot joint states as input and produce robot control actions auto-regressively instead of text.
- Unlike VLM training where web data is abundant, VLA training has no web-scale source — building the data engine is >50% of the work; they collect data via teleoperation with leader arms strapped to human operators.
- Started with the Open Cross-Embodiment dataset (~3,800 hours of mostly static-scene data); after 6 months collected 10,000 hours of successful episodes; later expanded into mobile manipulation across hundreds of scenes.
- Released PI-Zero (late last year), capable of dexterous tasks like folding shirts from a dryer; draws analogy that VLAs are tracking VLM progress with a ~3-year lag (RT-2 in 2023 was VLA's GPT-3 moment).
roboticsvlaphysical-intelligence
Original description
Sharing recent progress from Physical Intelligence and why it is an exciting time to push the frontier in general purpose robotics About Quan Vuong Quan Vuong is co-founder at Physical Intelligence. His research focuses on generalist robotics and algorithms that enable intelligent behaviors through large scale learning. About Jost Tobias Springenberg Tobias is currently a research scientist at Physical Intelligence where he works on bringing AI into the real world and understanding the fundamentals of sequential decision making (e.g. imitation and reinforcement learning). He likes his machine learning models big and his data to be plentiful and focuses most of his research on engineering driven machine learning at scale for robotics. Before joining Physical Intelligence Tobias was a research scientist Google Deepmind in London within the control team which generally focuses on applications of ML to control for science and robotics. Before that he was a researcher at the University of Freiburg working with the Machine Learning Group and Computer Vision Groups. Tobias holds a BSc. in Cognitive Science from the University of Osnabrueck – from which he still retains an interest in understanding human cognition – and a MSc. in Computer Science from the University of Freiburg. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter