Towards Faithful Chain-of-Thought Reasoning NeurIPS 2024
2024-10
Probing the Depths: A Survey of Interpretability Methods for LLMs ACL 2024 (Best Paper Nominee)
2024-06
Do Language Models Learn Compositionality? Evidence from Synthetic Languages ICML 2024
2023-12
Constitutional AI Without Constitutional Constraints NeurIPS 2023
2023-07
The Geometry of Meaning in Language Models EMNLP 2022
2022-10