This reminds me of this yearly StarCraft AI competition (since 2010), however I think it uses a special API that makes it easy for bots to access the game
Wouldn't it be interesting if the LLMs would write realtime RTS-commands instead of Code? After all it is a RTS game.
This would bring another dimension to it since then quality of tokens would be one dimension (RTS-language: Decision Making) and speed of tokens the other (RTS-language: Actions Per Minute; APM).
That calculates the ELOs for each AI implementation, and I feed it to different agents so they get really creative trying to beat each other. Also making rule changes to the game and seeing how some scripts get weaker/stronger is a nice way to measure balance.
I know visualization is far from the most important goal here, but it really gets me how there's fairly elaborately rendered terrain, and then the units are just unnamed roombas with hard to read status indicators that have no intuitive meaning. Even in the match viewer I have no clue what's going on, there is no overlay or tooltip when you hover or click units either. There is a unit list that tries (and mostly fails) to give you some information, but because units don't have names you have to hover them in the list to have them highlighted in the field (the reverse does not work). Not exactly a spectator sport. Oh, but there is a way to switch from having all units in one sidebar to having one sidebar per player, as if that made a difference.
I find this pretty funny because it seems like a perfect representation of what's easy with today's tools and what isn't
Yeah, it's all what you get when you basically ask an agent "Build X" without any constraints about how the UI and UX actually should work, and since the agents have about 0 expertise when it comes to "How would a human perceive and use this?", you end up with UIs that don't make much sense for humans unless you strictly steer them with what you know.
At least until one of the competitors is overheard saying "A strange game. The only winning move is not to play"
Reminds me of this fantastic series on Game Theory and Agent Reasoning https://jdsemrau.substack.com/p/nemotron-vs-qwen-game-theory...
This reminds me of this yearly StarCraft AI competition (since 2010), however I think it uses a special API that makes it easy for bots to access the game
Edit: Forgot link: https://davechurchill.ca/starcraft/
Wouldn't it be interesting if the LLMs would write realtime RTS-commands instead of Code? After all it is a RTS game.
This would bring another dimension to it since then quality of tokens would be one dimension (RTS-language: Decision Making) and speed of tokens the other (RTS-language: Actions Per Minute; APM).
Also there are a lot of coding benchmarks, that way it would test something more abstract, similar to AlphaStar https://en.wikipedia.org/wiki/AlphaStar_(software)
You could just use the exposed APIs of OpenAI, Anthropic etc. and let them battle.
This is amazing. What I do is something else: I make AI agents develop AI scripts (good ol' computer player scripts) and try to beat each other:
https://egeozcan.github.io/unnamed_rts/game/
I occasionally run my tournament script: https://github.com/egeozcan/unnamed_rts/blob/main/src/script...
That calculates the ELOs for each AI implementation, and I feed it to different agents so they get really creative trying to beat each other. Also making rule changes to the game and seeing how some scripts get weaker/stronger is a nice way to measure balance.
Funny thing, Codex gets really aggressive and starts cheating a lot of times: https://bsky.app/profile/egeozcan.bsky.social/post/3mfdtj5dh...
I know visualization is far from the most important goal here, but it really gets me how there's fairly elaborately rendered terrain, and then the units are just unnamed roombas with hard to read status indicators that have no intuitive meaning. Even in the match viewer I have no clue what's going on, there is no overlay or tooltip when you hover or click units either. There is a unit list that tries (and mostly fails) to give you some information, but because units don't have names you have to hover them in the list to have them highlighted in the field (the reverse does not work). Not exactly a spectator sport. Oh, but there is a way to switch from having all units in one sidebar to having one sidebar per player, as if that made a difference.
I find this pretty funny because it seems like a perfect representation of what's easy with today's tools and what isn't
Love the idea though
Yeah, it's all what you get when you basically ask an agent "Build X" without any constraints about how the UI and UX actually should work, and since the agents have about 0 expertise when it comes to "How would a human perceive and use this?", you end up with UIs that don't make much sense for humans unless you strictly steer them with what you know.
Nice. Curious about 5.3-codex-high results
Great project! It would be interesting to have a meta layer of AIs betting on the player LLMs
This is actually fun to watch :D
Now I'd love to see if fast > smart over time with Mercury 2.