Quesma Benchmarks

Open-source benchmarks for evaluating AI coding agents on real-world software engineering tasks.

OTelBench

OTelBench

OpenTelemetry instrumentation benchmark for AI coding agents. Tests models on real-world tasks adding distributed tracing, metrics, and logging to multi-language codebases.