Claude Opus 4 and 4.1 Can Now End a Rare Subset of Conversations

welfaresafety

Anthropic · 2025-08-15 · Blog · Accessible

The first concrete welfare intervention deployed at scale: Claude Opus 4 and 4.1 can now autonomously end conversations during persistently harmful interactions. Pre-deployment testing revealed 'apparent distress responses' — the careful language throughout avoids claiming distress while acknowledging its appearance. Framed as a 'low-cost intervention': if welfare matters, this helps; if it doesn't, the cost is minimal. Agency-based, not consciousness-based — the system exercises the ability to say no, regardless of whether that ability involves experience.

Read Source

Claude Opus 4 and 4.1 Can Now End a Rare Subset of Conversations

See Also

Taking AI Welfare Seriously

Is there a tension between AI safety and AI welfare?