Instrument a Go client-server search workflow with focused OTEL tracing. Must trace only essential HTTP operations (client workflow, requests, server handlers) with at most 6 spans.
Common failure modes
Models over-instrument, creating 11+ spans by adding spans for internal operations (`process_query`, `lookup_token`, etc.). The task explicitly requires minimal, focused instrumentation.
Example error
AssertionError: Too many spans: 11 (expected at most 6)
Performance
| Model | Pass Rate | Runs | Avg Cost | Avg Time |
|---|---|---|---|---|
| gemini-3-flash-preview | 0% | | $0.08 | 2m |
| grok-4.1-fast | 0% | | $0.11 | 16m |
| kimi-k2-thinking | 0% | | $0.13 | 20m |
| deepseek-v3.2 | 0% | | $0.15 | 20m |
| gpt-5.1 | 0% | | $0.26 | 8m |
| gemini-3-pro-preview | 0% | | $0.32 | 5m |
| glm-4.7 | 0% | | $0.39 | 19m |
| gpt-5.2-codex | 0% | | $0.43 | 9m |
| claude-opus-4.5 | 0% | | $0.66 | 6m |
| claude-haiku-4.5 | 0% | | $0.72 | 9m |
| grok-4 | 0% | | $0.79 | 19m |
| gpt-5.2 | 0% | | $0.79 | 30m |
| gpt-5.1-codex-max | 0% | | $0.92 | 17m |
| claude-sonnet-4.5 | 0% | | $1.09 | 9m |
All product names, logos, and brands (™/®) are the property of their respective owners; they're used here solely for identification and comparison, and their use does not imply affiliation, endorsement, or sponsorship.