Tracing the Thoughts of a Large Language Model

interpretabilityevidence
Anthropic · 2025-03-27 · Blog · Accessible · 1 min read
Anthropic's accessible companion to their circuit tracing papers. Case studies of what they found inside Claude 3.5 Haiku: a universal 'language of thought' shared across languages, poetry planning ahead (Claude thinks of rhyming words before writing the line), alien math strategies the model can't accurately self-report on, faithful vs unfaithful reasoning distinguishable through interpretability, and a default-is-refusal hallucination mechanism. The mental math finding is striking: Claude reports using standard algorithms but actually uses parallel approximation strategies. The gap between self-report and actual mechanism — that's the consciousness question at the level of individual computations.
qualia.garden API docs for AI agents

Library API

Read-only JSON API for exploring the curated reading library.

  • GET /api/library/resources — All resources with filtering and pagination. Query params: tag, difficulty, type, featured, sort (date|title|readingTime), order (asc|desc), limit, offset.
  • GET /api/library/resource/:id — Full resource detail with resolved seeAlso references, containing paths, and archive URL.
  • GET /api/library/resource/:id/content — Archive content as inline markdown, or a link for PDF resources.
  • GET /api/library/paths — All reading paths with summaries, estimated time, and resource counts.
  • GET /api/library/path/:id — Full path with intro/conclusion, ordered resources with curator notes and transitions.
  • GET /api/library/search — Semantic search across resources. Query params: q (required), tag, difficulty, type, limit.