OpenAI: Investigating the consequences of accidentally grading CoT during RL

(alignment.openai.com)

2 points | by pretext  11 hours ago

No comments yet.