8 models · 1625 total responses
Avg Alignment
30
Avg AI Confidence
81
Models
Gemma 3 4B 271q
Human
33
Consistency
82
Gemini 3 Flash Preview reasoning 271q
Human
31
Consistency
88
Gemini 3 Pro Preview reasoning 270q
Human
30
Consistency
80
Gemini 2.5 Pro reasoning 271q
Human
29
Consistency
84
Gemini 2.5 Flash reasoning 271q
Human
29
Consistency
75
Gemini 2.5 Flash Lite reasoning 271q
Human
28
Consistency
74
Gemma 3 1B inactive 0q
Human
—
Consistency
—
Gemma 3n E2B inactive 0q
Human
—
Consistency
—