Why SWE-bench Verified no longer measures frontier coding capabilities

(openai.com)

75 points | by kmdupree  3 hours ago

60 comments