← back
The Current State of Browser Agents - Jerry Wu and Wyatt Marshall
Takeaway
Browser agents are now feasible for read tasks and creeping into write tasks, but evaluation is hard and the underlying browser infrastructure can swing performance as much as the model.
Summary
- Jerry Wu and Wyatt Marshall (founders of Illuminate) define browser agents as AI that controls a web browser via an observe-reason-act loop (screenshots/VLM or HTML/DOM extraction).
- Common use cases: web scraping for sales prospecting, software QA, form/job-application filling, generative RPA replacing brittle UiPath-style workflows.
- Tasks split into read tasks (info gathering, easier) and write tasks (state-changing actions, much harder both to build and evaluate).
- Introduce their own benchmark (published the prior week) measuring agent performance and emphasize that infrastructure (browser hosting) materially affects results.
browser-agentsbenchmarksrpa
Original description
Browser agents are here. But beyond simple sample use cases (I'm looking at you flight booking demo), are they as good as advertised? In this talk, we introduce Web Bench, a new benchmark we've developed that rigorously tests browser agents across 450+ websites on real-world action based objectives such as info extraction, login/auth, form filling, and others. We'll dive into the results, unpack some unexpected discoveries, and discuss broader implications for the future of general purpose agents. You'll walk away with practical insights into: 1. data-driven understanding of the capabilities and limitations of state-of-the-art browser agents 2. how to meaningfully evaluate browser agents 3. hard-won lessons on designing and launching a benchmark Come through and see what browser agents can really do. Resources Leaderboard - https://webbench.ai/ Technical Report: https://halluminate.ai/blog/benchmark Github - https://github.com/Halluminate/WebBench Huggingface - https://huggingface.co/datasets/Halluminate/WebBench