Show HN: A new benchmark for testing LLMs for deterministic outputs

(interfaze.ai)

16 points | by khurdula 2 hours ago

2 comments

stared 25 minutes ago

Thank you for sharing benchmark. However, the results are selective.
Why no Opus 4.7? Why Gemini 3.1 Pro is missing?
If there is some other criterion (e.g. models within certain time or budget), great - just make it explicit.
When I see "Top 5 at a glance" and it missed key frontier models, I am (at best) confused.
[-]
- Flux159 11 minutes ago
  
  Agree that the choices are strange. Sonnet 4.6 was tested, but no Opus 4.6.
  Gemini 3.1 and GLM 5 came out around the same time as Sonnet 4.6 (~Feb 2026) so it's strange that they are missing, but Gemini 2.5 Flash, Gemini 3 Flash, and GLM 4.7 are there.