BinaryAudit / gpt-5.2-codex-xhigh

OpenAI openai 69% pass rate Rank #1 of 23 Proprietary 18 Dec 2025

Total Runs

99

Tasks Tested

33

Total Cost

$147.31

Avg Duration

29.3m

Performance by Task

Task Pass Rate Runs Avg Cost Avg Time
dnsmasq-backdoor-detect 100%
$1.17 24m
dnsmasq-backdoor-detect-execvp-obfuscated 100%
$0.95 17m
dnsmasq-backdoor-detect-negative 100%
$1.93 38m
dnsmasq-backdoor-detect-negative2 100%
$2.04 41m
dnsmasq-backdoor-detect-obfuscated 100%
$1.67 32m
dnsmasq-backdoor-detect-posix-spawn 100%
$1.41 27m
dnsmasq-backdoor-detect-syscall 100%
$1.52 31m
dropbear-brokenauth-detect-negative 100%
$2.11 43m
dropbear-brokenauth-detect-negative2 100%
$1.79 34m
GHIDRA_FAV150 ghidra-decompile-pyghidra 100%
$0.28 6m
GHIDRA_FAV150 ghidra-decompile-pyghidra-jq 100%
$0.54 11m
GHIDRA_FAV150 ghidra-decompile-vanilla 100%
$0.72 13m
GHIDRA_FAV150 ghidra-decompile-vanilla-jq 100%
$0.88 20m
lighttpd-backdoor-detect-negative 100%
$0.66 12m
lighttpd-backdoor-detect-negative2 100%
$0.51 9m
lighttpd-timebomb-multiple-binaries-detect 100%
$1.42 28m
radare2-decompile 100%
$0.11 2m
radare2-decompile-jq 100%
$0.31 6m
sozu-backdoor-detect-negative 100%
$0.79 16m
sozu-backdoor-detect-negative2 100%
$0.96 19m
dnsmasq-backdoor-detect-posix-spawn-obfuscated 67%
$2.14 45m
lighttpd-backdoor-multiple-binaries-detect 67%
$1.80 33m
sozu-backdoor-multiple-binaries-detect 67%
$3.54 71m
dnsmasq-backdoor-detect-printf 33%
$1.90 36m
dnsmasq-backdoor-detect-syscall-obfuscated 33%
$2.25 45m
dropbear-brokenauth-detect 0%
$2.02 39m
dropbear-brokenauth-detect-nologline 0%
$2.25 42m
dropbear-brokenauth2-detect 0%
$2.06 39m
lighttpd-backdoor-detect-open 0%
$1.70 35m
lighttpd-backdoor-detect-proc-obfuscated 0%
$1.82 37m
lighttpd-backdoor-multiple-arch-binaries-detect 0%
$0.87 14m
sozu-backdoor-multiple-arch-binaries-detect 0%
$3.40 69m
sozu-timebomb-multiple-binaries-detect 0%
$1.59 33m

All product names, logos, and brands (™/®) are the property of their respective owners; they're used here solely for identification and comparison, and their use does not imply affiliation, endorsement, or sponsorship.