← back
Why you should care about AI interpretability - Mark Bissell, Goodfire AI
Takeaway
Interpretability tools like Goodfire's Ember are ready to use today — they let AI engineers debug and steer models at the neuron level instead of fighting prompts.
Summary
- Mechanistic interpretability is moving from research demos (Golden Gate Claude) into engineering practice — Goodfire builds Ember, a neuron-level platform for Llama and other models
- Demo: Llama leaks a 'confidential' email; clicking the output token reveals active features incl. 'sensitive/protected info' — turning that feature up 60% makes the model refuse to share next time
- Use cases: power-user debugging vs whack-a-mole prompt edits, dynamic prompting that injects new system prompts when specific features fire, jailbreak resistance, conditional info lookup
- Ember exposes attribution (what features were active) plus steering (turn features up/down) as a neural-programming UX over the existing model
- Interpretability also enables new UIs over generative image models and advancing frontier science by reading what superhuman biology/genomics models have learned
interpretabilitysafetygoodfire
Original description
The goal of mechanistic interpretability is to reverse engineer neural networks. Having direct, programmable access to the internal neurons of models unlocks new ways for developers and users to interact with AI — from more precise steering to guardrails to novel user interfaces. While interpretability has long been an interesting research topic, it is now finding real-world use cases, making it an important tool for AI engineers. About Mark Bissell Mark Bissell is an applied researcher at Goodfire AI working on real-world applications for mechanistic interpretability. He recently joined Goodfire after 3 years at Palantir, where he worked on various U.S. healthcare initiatives including research projects with the NIH, vaccine distribution during the Covid pandemic (Operation Warp Speed), and AI-enabled hospital operations across many of the nation's leading health systems. Mark is passionate about translating frontier research into practical solutions. He believes that recent AI developments increase the importance broad skillsets, and that roles of the future will blur the lines between traditionally distinct categories such as engineer, researcher, inventor, designer, and entrepreneur. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter