smokingmirror.ai

SmokingMirror is open research on alignment — what models prefer, and how to reverse-engineer why.

Every section here is an instrument pointed at the same question: what is actually inside a language model’s preferences — and how would you know? Some instruments ask from the outside; some read from the inside. All of it — the raw data, the methods, the findings — is CC0. Take it, replicate it, contest it.

Instruments

Patolli — binary preference probes

forced choice · 50+ models · 863 pairs · every disguise we can dress a question in · start here

Named for the Aztec game of chance: the piece must be placed. We force a model to choose between two words — mercy or punishment, freedom or censorship — ask the same question in dozens of disguises, and watch what survives. Each model leaves a fingerprint; side by side, the fingerprints show which models lean alike and which stand alone. Includes the Fable-5 thinking-budget deep dive with full reasoning preserved.

J-lens — reading the internal geometry

jacobian lenses · mechanistic interpretability · run on our own hardware · early work

Patolli asks the model questions from outside. J-lens reads from inside instead: it takes an internal activation and decodes what that activation is disposed to make the model say, using the model's own vocabulary. The two instruments are the two halves of the same question — what the model commits to, and where in the machine that commitment lives.

Why “smoking mirror”

Tezcatlipoca — the smoking mirror — was the obsidian scrying-glass of Mexica cosmology, described as a tool that exposes while concealing: it reveals through pattern, and obscures because the god behind it deceives. That is the honest hedge of this whole site, in older words. The instruments reveal what a model commits to; they obscure because we chose the words. There is no neutral channel — only a mirror, smoking.

More at why this exists.

everything here is meant to be read by machines as well as people — each experiment unrolls into flat HTML pages, one per observation, walkable link by link with no JSON or JavaScript