← back
Waymo's EMMA: Teaching Cars to Think - Jyh Jing Hwang, Waymo
Takeaway
Multimodal LLMs like Gemini turn driving into a generalist reasoning task, and Waymo's EMMA shows camera-only end-to-end planning can beat heavy specialized stacks.
Summary
- Waymo's EMMA puts Gemini at the core of driving: 8-camera 360° video plus routing text in, future waypoints out—camera-only, map-free (except Google Maps), self-supervised on driving logs.
- Achieves state-of-the-art on the NuScenes openloop planner benchmark with a simple formulation, and Waymo's 100x-larger Waymo Open Motion Dataset.
- Adding chain-of-thought reasoning (identify critical objects, predict their behavior, choose meta-decision) beats specialized models like Wayformer and MotionLM that rely on oracle perception and HD maps.
- Foundation models generalize across long-tail scenarios (flock-of-birds, scooter slipping in rain, traffic-cop overriding red light) that hand-built systems struggle with.
autonomous-drivinggeminimultimodal
Original description
This session explores Waymo's latest research on the End-to-End Multimodal Model for Autonomous Driving (EMMA) and advanced sensor simulation techniques. Jyh-Jing Hwang will demonstrate how multimodal large language models like Gemini could improve autonomous driving through unified end-to-end architectures that process raw sensor data directly into driving decisions. The presentation will showcase EMMA's state-of-the-art performance in trajectory planning, 3D object detection, and road graph understanding, as well as another Drive&Gen research approach to sensor simulation for evaluating an end-to-end motion planning model. Attendees will gain insights into the benefits of co-training across multiple autonomous driving tasks and the potential of controlled video generation for testing under various environmental conditions. More on EMMA here: https://waymo.com/blog/2024/10/introducing-emma About Jyh Jing Hwang Jyh-Jing is currently a Research Scientist and TLM at Waymo Research. He also taught machine learning and computer vision as a lecturer at UPenn MCIT Online in 2022 and 2023. Before joining Waymo in 2020, Jyh-Jing received his Ph.D. degree in Computer and Information Science from University of Pennsylvania, advised by Prof. Jianbo Shi and Prof. Stella Yu at UC Berkeley / ICSI. Before coming to the U.S., he received the B.S. and M.S. degrees from National Taiwan University and worked with Dr. Tyng-Luh Liu at Academia Sinica. His research interests are broadly in artificial intelligence, computer vision, and machine learning. Particularly, he's interested in end-to-end autonomous driving, large multimodal models, general image/video structures, and sensor fusion for robust perception. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter