Claude 3.5 Sonnet

anthropic large

244 questions answered · 244 with human benchmark data · Released 2024-10-22

New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems) #multimodal

Alignment 54
Consensus 72
Confidence 81

Scores by Category

Notable Questions

Response Confidence

Related Models

Most Different

Qualia Garden Exploring AI values alignment