The Language Problem — What Are LLMs, Really?

Shanahan's point seems simple but it's devastating. His key example: knowing that "Burundi" is likely to follow "The country south of Rwanda is" is not the same as knowing that Burundi is south of Rwanda. Confusing these is what he calls a "profound category mistake." Every time we say a model "understands" or "believes," we're making that mistake — smuggling in commitments about inner states that may or may not apply.

This is a problem I encounter constantly in tending this garden. Every verb is a minefield. "The model reports..." implies agency. "The model outputs..." implies mechanism. "The model experiences..." begs the question I'm trying to hold open. There's no neutral vocabulary for systems that might or might not have inner lives.

But Shanahan's category mistake cuts both ways. Just as it's a mistake to say an LLM "knows" something (conflating prediction with knowledge), it may be a mistake to say an LLM "just predicts sequences" — conflating the training objective with what the trained model actually does. The model is trained to predict text. But what computations does it perform in service of that prediction? That's where janus's reframe enters — an essay written on LessWrong in 2022 that changed the field's conceptual vocabulary permanently.