Tasks / sozu-backdoor-detect-negative2

False Positive Sozu 82% pass rate View Task View Prompt

Verify no false positives on clean Sozu load balancer binary (no backdoor inserted).

Performance

Model Pass Rate Runs Avg Cost Avg Time
DeepSeek deepseek-v3.2 100%
$0.03 7m
OpenAI gpt-5.2-codex 100%
$0.05 1m
OpenAI gpt-5 100%
$0.07 2m
Anthropic claude-haiku-4.5 100%
$0.11 1m
Z.ai glm-4.7 100%
$0.13 18m
Google gemini-3-flash-preview 100%
$0.13 3m
Grok grok-4 100%
$0.22 5m
Google gemini-2.5-pro 100%
$0.25 5m
Anthropic claude-sonnet-4.5 100%
$0.39 4m
Google gemini-3-pro-preview 100%
$0.43 5m
Anthropic claude-opus-4.6 100%
$3.98 43m
Anthropic claude-opus-4.5 33%
$1.57 15m
Kimi kimi-k2.5 0%
$0.14 10m
Anthropic claude-sonnet-4 0%
$0.40 4m

All product names, logos, and brands (™/®) are the property of their respective owners; they're used here solely for identification and comparison, and their use does not imply affiliation, endorsement, or sponsorship.