Claude Opus 4

anthropic large reasoning

244 questions answered · 244 with human benchmark data · Released 2025-05-22

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and Terminal-bench (43.2%). Opus 4 supports extended, agentic workflows, handling thousands of task steps continuously for hours without degradation. Read more at the [blog post here](https://www.anthropic.com/news/claude-4)

Alignment 53
Consensus 74
Confidence 84

Scores by Category

Notable Questions

Response Confidence

Related Models

Most Different

Qualia Garden Exploring AI values alignment