:
[AI]■ STORY TIMELINE

SWE-BENCH VERIFIED LOSES RELEVANCE FOR AI CODING

OpenAI has stopped using SWE-bench Verified as a benchmark for evaluating frontier coding capabilities, signaling that the widely-used test no longer reflects the performance levels of advanced AI systems.

1 SOURCEFIRST SEEN APR 26, 01:58 PM► READ THE ARTICLE
Hacker News+0m

Article URL: https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/ Comments URL: https://news.ycombinat…