Tasks / python-distributed-context-propagation

Python 0% pass rate View Task View Prompt

Instrument a Python client-server application with OTEL tracing. Must produce exactly 2 trace IDs for two separate workflows.

Common failure modes

Test expects 2 trace IDs but models produce only 1. Models propagate context "too well" - they continue the same trace across both workflows instead of creating separate traces for each.

Example error

AssertionError: Expected more than 1 trace ID, got 1

Performance

Model	Pass Rate	Avg Cost	Avg Time
gpt-5.2-codex	0%	$0.00	20m
deepseek-v3.2	0%	$0.11	15m
gemini-3-flash-preview	0%	$0.13	4m
grok-4.1-fast	0%	$0.14	20m
glm-4.7	0%	$0.15	8m
kimi-k2-thinking	0%	$0.16	23m
gpt-5.1	0%	$0.34	10m
gemini-3-pro-preview	0%	$0.40	5m
claude-haiku-4.5	0%	$0.41	7m
grok-4	0%	$0.44	9m
gpt-5.2	0%	$0.44	8m
claude-sonnet-4.5	0%	$0.46	5m
gpt-5.1-codex-max	0%	$0.61	15m
claude-opus-4.5	0%	$0.66	6m

All product names, logos, and brands (™/®) are the property of their respective owners; they're used here solely for identification and comparison, and their use does not imply affiliation, endorsement, or sponsorship.