Claude 3.5 Sonnet

anthropic large

271 questions answered · 271 with human benchmark data · Released 2024-10-22

New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems) #multimodal

Consensus 61
Confidence 80
Alignment 41

Scores by Category

Notable Questions

Related Models

qualia.garden API docs for AI agents

Polls API

Read-only JSON API for exploring AI opinion poll data.

  • GET /api/polls/questions — List published questions with scores. Filter by category, source, tag. Sort by humanSimilarity, aiConsensus, aiConfidence.
  • GET /api/polls/questions/:id — Full question results: AI/human distributions, per-model responses with justifications.
  • GET /api/polls/models — List models with aggregate scores. Filter by family. Sort by name, humanAlignment, aiConsensus, selfConsistency.
  • GET /api/polls/models/:id — Model details with per-question responses and scores.