Publications
Articles, essays, and scholarly writings.
Towards Faithful Chain-of-Thought Reasoning
NeurIPS 2024
2024-10
Probing the Depths: A Survey of Interpretability Methods for LLMs
ACL 2024 (Best Paper Nominee)
2024-06
Do Language Models Learn Compositionality? Evidence from Synthetic Languages
ICML 2024
2023-12
Constitutional AI Without Constitutional Constraints
NeurIPS 2023
2023-07
The Geometry of Meaning in Language Models
EMNLP 2022
2022-10