Empirical & Modern Innovation
We target the frontier where current systems fracture: long-horizon agency, episodic memory, and state tracking over extended interactions. Our work begins with the concrete failure modes of today's strongest models—loss of coherence in multi-step reasoning, inability to maintain world models over time, and the brittleness of tool use. Alongside building, we are equally interested in explaining: characterizing the mechanics of in-context learning and disentangling the variables that drive emergent capabilities so that progress becomes a predictable science rather than a lottery.
Neuroscience-Inspired Intelligence
The brain solves problems that deep learning still cannot: continual learning without catastrophic forgetting, one-shot generalization, and energy-efficient inference. We study the computational principles of neural circuits—sparse distributed coding, predictive error minimization, Hebbian plasticity, columnar organization—to extract architecturally useful inductive biases. The goal is not biomimicry, but identifying which neuroscientific principles translate into concrete improvements in artificial systems.
Mathematical Foundations
Why does a specific architecture work? We treat this as a search problem: training is search over function space, architecture design is search over the space of possible inductive biases, and both are constrained by information-theoretic and computational limits. We pursue mathematical frameworks that explain why certain architectures and training methods succeed—connecting phenomena like grokking, superposition, and scaling laws to fundamental constraints on learnability—with the explicit aim of using these theories to design better systems rather than merely describe existing ones.
AI Safety & Alignment
Alignment is a structural property, not a wrapper. We tackle the hard problems of scalable oversight and deceptive alignment: how do we verify systems that are smarter than us, and how do we detect when a model is gaming its reward function? Our interpretability research (routing-based monosemanticity, circuit discovery) builds the necessary tools for this verification—allowing us to inspect internal reasoning chains, detect instrumental convergence before it manifests in behavior, and ensure that our models remain truthful even when unobserved.
Reinforcement Learning & Robotics
Sequential decision-making under uncertainty remains one of the hardest open problems in AI. We work on RL from both ends: the theoretical side (sample complexity bounds, exploration-exploitation in high-dimensional state spaces, offline RL with distribution shift) and the applied side (sim-to-real transfer, learned dynamics models, and contact-rich manipulation where current policies consistently fail).