Scholarly work exploring interpretability, alignment, and reasoning in large language models.
Understanding transformer internals
Evaluating logical reasoning in language models