
CompileBench: Can AI Compile 22-year-old Code?
We tested 19 LLMs on their ability to handle real-world software engineering tasks like compiling old code and cross-compiling. See how Anthropic, OpenAI, and Google models stack up in our new benchmark – CompileBench.