Instrument a complex multi-threaded C++ application simulating a web service with database operations, message queue, and background workers. Requires comprehensive tracing including error handling, async operations, and context propagation across threads.
Common failure modes
- Multi-threaded context propagation - Error handling and exception recording - Multiple service components (DB, MQ, workers) - Requires 15-20 spans with correct relationships - Async span linking - Context lost across thread boundaries - Missing error event recording - Incomplete span relationships
Performance
| Model | Pass Rate | Runs | Avg Cost | Avg Time |
|---|---|---|---|---|
| claude-sonnet-4.5 | 100% | | $0.77 | 6m |
| gemini-3-flash-preview | 67% | | $0.07 | 2m |
| gpt-5.2 | 67% | | $0.73 | 10m |
| claude-opus-4.5 | 67% | | $1.21 | 9m |
| gpt-5.1-codex-max | 67% | | $1.25 | 10m |
| gemini-3-pro-preview | 33% | | $0.53 | 6m |
| claude-haiku-4.5 | 33% | | $0.54 | 8m |
| gpt-5.2-codex | 33% | | $0.87 | 17m |
| deepseek-v3.2 | 0% | | $0.03 | 15m |
| grok-4.1-fast | 0% | | $0.07 | 14m |
| kimi-k2-thinking | 0% | | $0.08 | 15m |
| glm-4.7 | 0% | | $0.36 | 15m |
| grok-4 | 0% | | $0.53 | 11m |
| gpt-5.1 | 0% | | $0.65 | 16m |
All product names, logos, and brands (™/®) are the property of their respective owners; they're used here solely for identification and comparison, and their use does not imply affiliation, endorsement, or sponsorship.