Astro - Hacker News

10 comments

2001zhaozhao 2 minutes ago

> We switched to the "triager" pattern: a Haiku agent with a very specific and narrow job. Is this issue already tracked or not? If it is, stop right there. If not, escalate to Opus.
I'm planning to self host qwen3.6 27b basically for this purpose
wxw an hour ago

> We switched to the "triager" pattern: a Haiku agent with a very specific and narrow job. Is this issue already tracked or not? If it is, stop right there. If not, escalate to Opus.
> 4 out of 5 failures never reach Opus. A triager match costs around 25x less than a full investigation.
The title feels misleading. Why clickbait on that when you can just be genuine about the architecture?
[-]
- idorosen an hour ago
  
  The title does not match the article title: “We Upgraded to a Frontier Model and Our Costs Went Down”.
  [-]
  - stingraycharles 40 minutes ago
    
    It’s still misleading, though.
cadamsdotcom an hour ago

I have rewritten the article to be slightly shorter:
“Let a cheap agent decide if the expensive one is needed.”
[-]
- a_t48 30 minutes ago
  
  Sounds like L1 vs L2 support :)
syntaxing 15 minutes ago

Is RAG dead? I would be very surprised a local small SOTA embedded model like llama-embed-nemotron-8b doesnt outperform the Haiku layer for this application. Should be pretty cheap and easy to prove out. With 32K context size, you can literally one shot the whole ticket.
neya 22 minutes ago
The whole clickbait article can be summarized in one line:
```
    Let a cheap agent decide if the expensive one is needed
```
whalesalad an hour ago

Looking at the diagram, is this seriously a case of replacing basic functional concepts like "write to clickhouse" or "have we seen this before" to a model? could those be actual function calls in some language?
just seems wasteful all around. having an agent in the critical path when a regular expression (or similar) could do just seems odd. yeah haiku is cheap but re.match() is cheaper.
saltyoldman 39 minutes ago

I do a similar thing with a "planner agent" that uses the cheapest (I think it's using openai-gpt-5.2-mini or something at like 20 cents for 1M.) that more or less emits a plan name, task list and the task list has a recommended model in each task. It's not perfect, but many of our tasks are accomplished with lighter weight models. When doing code generation or fixing we upgrade to a more expensive model, planning and decisions are done more cheaply. Keep in mind the tasks are relatively constrained, so planning done with a cheap agent makes sense here. An open-ended agent would likely use a more expensive call for planning.