14 models tested across 24 tasks, ranked by pass rate.
| # | Organization | Model | Pass Rate | Cost | Time | License | Released | |
|---|---|---|---|---|---|---|---|---|
| 1 | anthropic | claude-opus-4.5 | 27.8% | | $88 | 13m | Proprietary | 2025-11-24 |
| 2 | openai | gpt-5.2 | 25.0% | | $49 | 18m | Proprietary | 2025-12-11 |
| 3 | anthropic | claude-sonnet-4.5 | 20.8% | | $71 | 12m | Proprietary | 2025-09-29 |
| 4 | gemini-3-flash-preview | 18.1% | | $8 | 6m | Proprietary | 2025-12-17 | |
| 5 | openai | gpt-5.2-codex | 16.7% | | $14 | 15m | Proprietary | 2025-12-18 |
| 6 | gemini-3-pro-preview | 15.3% | | $34 | 9m | Proprietary | 2025-11-18 | |
| 7 | openai | gpt-5.1 | 13.9% | | $31 | 16m | Proprietary | 2025-11-12 |
| 8 | z-ai | glm-4.7 | 12.5% | | $24 | 19m | Apache 2.0 | 2025-12-22 |
| 9 | deepseek | deepseek-v3.2 | 11.1% | | $12 | 22m | MIT | 2025-12-01 |
| 10 | openai | gpt-5.1-codex-max | 11.1% | | $59 | 18m | Proprietary | 2025-11-19 |
| 11 | moonshotai | kimi-k2-thinking | 6.9% | | $9 | 21m | MIT | 2025-11-06 |
| 12 | anthropic | claude-haiku-4.5 | 5.6% | | $30 | 9m | Proprietary | 2025-10-15 |
| 13 | x-ai | grok-4 | 4.2% | | $57 | 16m | Proprietary | 2025-07-09 |
| 14 | x-ai | grok-4.1-fast | 2.8% | | $10 | 17m | Proprietary | 2025-11-19 |