Blog

I try to write often. If there are any discrepancies, or I don’t submit source code for a post without reason, please email me. I try to be open, honest, and genuine.

Deriving the Autoregressive Transformer

[METR Linkpost] CoT May Be Highly Informative Despite "Unfaithfulness"

In new research, we find that the CoT is informative about LLM cognition as long as the cognition is complex enough that it can’t be performed in a single forward pass

Implementing E&M Simulators

Backpropogatable FDFD and FDTD simulators in Jax, and trying to use diffusion models, tiled preconditioning, and other tricks to try to make them faster

Soviet Chess Diplomacy

Modeling Protein Evolution

Redundant Attention Heads in Large Language Models For In Context Learning

Research on redundant attention heads in language models and their role in in-context learning through Bayesian updates.

Language Models Update Based on In-Context Learning

A look at how language models update their priors based on in-context examples