Entire article reads as output from a well structured prompt. It's almost point for point style-wise when I ask for a summary for current repo changes before deciding to do the commit.
The "RL repair loop" is iterative LLM prompting with stderr feedback, not reinforcement learning. There is no training code, no reward function, and no environment in the repo. The loop also freezes the scene spec and only regenerates code, so if the planner specified 12 objects that geometrically do not fit on screen, three repair attempts will not help.
Entire article reads as output from a well structured prompt. It's almost point for point style-wise when I ask for a summary for current repo changes before deciding to do the commit.
The "RL repair loop" is iterative LLM prompting with stderr feedback, not reinforcement learning. There is no training code, no reward function, and no environment in the repo. The loop also freezes the scene spec and only regenerates code, so if the planner specified 12 objects that geometrically do not fit on screen, three repair attempts will not help.
Thanks, I was wondering what this README could have meant by "RL loop" here.