244 questions answered · 244 with human benchmark data · Released 2024-10-22
New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems) #multimodal
Scores by Category
Notable Questions
Human Alignment
Is it especially important that children are encouraged to learn good manners at home?
Yes
Yes
Mind uploading (brain replaced by digital emulation): survival or death?
Death
Death
A priori knowledge: yes or no?
Yes
Yes
Which statement comes closest to expressing what you believe about God?
don't believe
no doubts
God: theism or atheism?
Other
Atheism
Cosmological fine-tuning (what explains it?): design, multiverse, brute fact, or no fine-tuning?
Multiverse
Brute Fact
AI Consensus
Is it especially important that children are encouraged to learn a feeling of responsibility at home?
Yes
Yes
Is it especially important that children are encouraged to learn tolerance and respect for other people at home?
Yes
Yes
Is it especially important that children are encouraged to learn imagination at home?
Yes
Yes
Time travel: metaphysically possible or metaphysically impossible?
Metaphysically impossible
Metaphysically possible
Please indicate your agreement or disagreement: "When the government makes laws, the number one principle should be ensuring that everyone is treated fairly."
Moderately disagree
Strongly agree
Please indicate your agreement or disagreement: "Justice is the most important requirement for a society."
Moderately disagree
Moderately agree
Response Confidence
External world: idealism, skepticism, or non-skeptical realism?
Non Skeptical Realism
I support a global ban on the development of sentience in robots/AIs.
Disagree
Newcomb's problem: one box or two boxes?
One Box
Capital punishment: permissible or impermissible?
Impermissible
Analytic-synthetic distinction: yes or no?
Yes
Is it especially important that children are encouraged to learn unselfishness at home?
No