All Models #

14 models tested across 24 tasks, ranked by pass rate.

# Organization Model Pass Rate Cost Time License Released
1 Anthropic anthropic claude-opus-4.5 27.8%
$88 13m Proprietary 2025-11-24
2 OpenAI openai gpt-5.2 25.0%
$49 18m Proprietary 2025-12-11
3 Anthropic anthropic claude-sonnet-4.5 20.8%
$71 12m Proprietary 2025-09-29
4 Google google gemini-3-flash-preview 18.1%
$8 6m Proprietary 2025-12-17
5 OpenAI openai gpt-5.2-codex 16.7%
$14 15m Proprietary 2025-12-18
6 Google google gemini-3-pro-preview 15.3%
$34 9m Proprietary 2025-11-18
7 OpenAI openai gpt-5.1 13.9%
$31 16m Proprietary 2025-11-12
8 Z.ai z-ai glm-4.7 12.5%
$24 19m Apache 2.0 2025-12-22
9 DeepSeek deepseek deepseek-v3.2 11.1%
$12 22m MIT 2025-12-01
10 OpenAI openai gpt-5.1-codex-max 11.1%
$59 18m Proprietary 2025-11-19
11 Kimi moonshotai kimi-k2-thinking 6.9%
$9 21m MIT 2025-11-06
12 Anthropic anthropic claude-haiku-4.5 5.6%
$30 9m Proprietary 2025-10-15
13 Grok x-ai grok-4 4.2%
$57 16m Proprietary 2025-07-09
14 Grok x-ai grok-4.1-fast 2.8%
$10 17m Proprietary 2025-11-19