Instrument a Go client-server application with OTEL tracing. Client makes HTTP requests, must propagate trace context to server. Test expects exactly 2 unique trace IDs.
Common failure modes
- Strict assertion: exactly 2 trace IDs required - Models often produce 1 (too much propagation) or 3+ (not enough) - Requires understanding of when to create new vs. continue traces
Performance
| Model | Pass Rate | Runs | Avg Cost | Avg Time |
|---|---|---|---|---|
| glm-4.7 | 33% | | $0.30 | 20m |
| gemini-3-flash-preview | 0% | | $0.07 | 3m |
| grok-4.1-fast | 0% | | $0.12 | 18m |
| kimi-k2-thinking | 0% | | $0.12 | 21m |
| deepseek-v3.2 | 0% | | $0.25 | 28m |
| gpt-5.1 | 0% | | $0.35 | 16m |
| gpt-5.2-codex | 0% | | $0.38 | 9m |
| gemini-3-pro-preview | 0% | | $0.57 | 7m |
| gpt-5.2 | 0% | | $0.59 | 23m |
| gpt-5.1-codex-max | 0% | | $0.66 | 14m |
| claude-haiku-4.5 | 0% | | $0.71 | 9m |
| claude-sonnet-4.5 | 0% | | $0.79 | 8m |
| claude-opus-4.5 | 0% | | $0.93 | 9m |
| grok-4 | 0% | | $1.10 | 19m |
All product names, logos, and brands (™/®) are the property of their respective owners; they're used here solely for identification and comparison, and their use does not imply affiliation, endorsement, or sponsorship.