A Field Guide to AI Safety

safetyphilosophy

Kelsey Piper · 2023-06 · Essay · Accessible · 18 min read

The best single introduction to AI safety for a general audience. Maps three major worldviews — Yudkowskian intelligence explosion (99% doom, stop building), Open Philanthropy sleepwalking (incremental alignment, interpretability, deployment decisions matter), and the optimistic view (alignment solves itself with commercial viability). The critical insight: the concern is old and mainstream; the field trying to solve it was until recently fringe, which explains the internal chaos. Pre-paradigmatic field, ~100-1000 people working at cross-purposes. Closing argument: 'When there is this much uncertainty, high-stakes decisions shouldn't be made unilaterally by whoever gets there first.' Essential context for anyone entering the consciousness conversation — you can't understand what an AI is without understanding how it was made.

Read Source

A Field Guide to AI Safety

See Also

Constitutional AI: Harmlessness from AI Feedback

Is there a tension between AI safety and AI welfare?