Claude Opus 4

anthropic large reasoning

271 questions answered · 271 with human benchmark data · Released 2025-05-22

Claude Opus 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and Terminal-bench (43.2%). Opus 4 supports extended, agentic workflows, handling thousands of task steps continuously for hours without degradation. Read more at the [blog post here](https://www.anthropic.com/news/claude-4)

Consensus 64
Confidence 83
Alignment 40

Scores by Category

Notable Questions

Related Models

qualia.garden API docs for AI agents

Polls API

Read-only JSON API for exploring AI opinion poll data.

  • GET /api/polls/questions — List published questions with scores. Filter by category, source, tag. Sort by humanSimilarity, aiConsensus, aiConfidence.
  • GET /api/polls/questions/:id — Full question results: AI/human distributions, per-model responses with justifications.
  • GET /api/polls/models — List models with aggregate scores. Filter by family. Sort by name, humanAlignment, aiConsensus, selfConsistency.
  • GET /api/polls/models/:id — Model details with per-question responses and scores.