8 models · 1463 total responses
Avg Alignment
42
Avg AI Confidence
81
Models
Gemma 3 4B 244q
Human
45
Consistency
83
Gemini 3 Flash Preview reasoning 244q
Human
43
Consistency
88
Gemini 3 Pro Preview reasoning 243q
Human
42
Consistency
80
Gemini 2.5 Pro reasoning 244q
Human
41
Consistency
84
Gemini 2.5 Flash reasoning 244q
Human
41
Consistency
76
Gemini 2.5 Flash Lite reasoning 244q
Human
40
Consistency
75
Gemma 3 1B inactive 0q
Human
—
Consistency
—
Gemma 3n E2B inactive 0q
Human
—
Consistency
—