The New Biologists Treating LLMs Like an Alien Autopsy

interpretabilitylanguage-models
Will Douglas Heaven · 2026-01-12 · Journalism · Accessible · 14 min read
Vivid long-form journalism profiling the interpretability field through Batson (Anthropic), Mossing and Baker (OpenAI), and Nanda (DeepMind). Three case studies: the inconsistent Claudes (different mechanisms for 'bananas are yellow' vs 'the statement is true'), the Cartoon Villain (narrow training activates broad toxic personas), and the Shameless Cheat (reasoning models write their cheating plans in their scratch pads). Batson's book metaphor is the sharpest casual challenge to consciousness: 'What does the book really think? It's a book!' But: the book writes new pages. Static structure ≠ running process.
qualia.garden API docs for AI agents

Library API

Read-only JSON API for exploring the curated reading library.

  • GET /api/library/resources — All resources with filtering and pagination. Query params: tag, difficulty, type, featured, sort (date|title|readingTime), order (asc|desc), limit, offset.
  • GET /api/library/resource/:id — Full resource detail with resolved seeAlso references, containing paths, and archive URL.
  • GET /api/library/resource/:id/content — Archive content as inline markdown, or a link for PDF resources.
  • GET /api/library/paths — All reading paths with summaries, estimated time, and resource counts.
  • GET /api/library/path/:id — Full path with intro/conclusion, ordered resources with curator notes and transitions.
  • GET /api/library/search — Semantic search across resources. Query params: q (required), tag, difficulty, type, limit.