My research sits at the intersection of natural language processing and interpretability. I'm driven by a simple question: how do we build AI systems that we can understand and trust?

Large language models are remarkable, but they remain largely opaque. My lab develops techniques to peer inside these systems—understanding what they've learned, how they reason, and when they might fail.

I believe interpretability isn't just an academic exercise. As AI systems become more powerful, understanding them becomes a safety imperative.

When I'm not in the lab, I enjoy hiking in the White Mountains, playing classical piano, and collecting first editions of philosophy books.

Skills & Expertise

Research

Natural Language ProcessingInterpretabilityAlignmentReasoning

Technical

PythonPyTorchJAXTransformersDistributed Training

Methods

ProbingMechanistic InterpretabilityCausal AnalysisBehavioral Testing

Education

Stanford University

Ph.D. Computer Science

2015-09 - 2020-05

Dissertation on compositional generalization. Advised by Christopher Manning.

UC Berkeley

B.S. Computer Science & Mathematics

2011-09 - 2015-05

Summa Cum Laude. Regents' and Chancellor's Scholar.

Correspondence