Astro - Hacker News

10 comments

wongarsu an hour ago

It does really well on "AA-Omniscience Non-Hallucination Rate", far higher than DeepSeek, GPT 5.5 or Fable. I really like that benchmark because it's one of the few benchmarks that allows LLMs to elect not to answer if they are unsure and punishes them for trying to bullshit their way through the benchmark
XCSme 19 minutes ago

I also tested it[0]: quite similar to GLM 5, a few percent better, 30% faster and 50% more expensive.
[0]: https://aibenchy.com/?q=glm
[-]
- XCSme 17 minutes ago
  
  PS: Just added a cool feature, so you can filter the leaderboard for multiple models at once, by using a comma, like: https://aibenchy.com/?q=glm,claude
- lousken 12 minutes ago
  
  still 1/4 of the price of anthropic and openai models though
theturtletalks 24 minutes ago

I want to trust their benchmarks but when they have Muse Spark over GPT-5.5, it gives me pause.
lanycrost an hour ago

It's always nice to see how open source models growing, hope we will have good performance with lower tier hardware some day.
sourcecodeplz 44 minutes ago

still quite verbose at 140m output tokens, but this is on max thinking. high should do better.
ChrisArchitect 20 minutes ago

Some more discussion: https://news.ycombinator.com/item?id=48567759
DeathArrow an hour ago

One or two more releases and they will reach Fable level.
[-]
- vitalyan123 4 minutes ago
  
  by then there will be Fable 5.21, again 5% ahead of every other SotA while still only 500% the size.