Measuring AI Ability to Complete Long Tasks

(metr.org)

243 points | by spicypete  3 days ago

194 comments