Astro - Hacker News

68 comments

simonw 15 minutes ago

Accessed via OpenRouter, this one decided to wrap the SVG pelican in HTML with controls for the animation speed: https://gisthost.github.io/?ecaad98efe0f747e27bc0e0ebc669e94...
Transcript and HTML here: https://gist.github.com/simonw/ecaad98efe0f747e27bc0e0ebc669...
[-]
- FlyingSnake 4 minutes ago
  
  At this point drawing these Pelicans must be in the training data sets.
- SwellJoe 7 minutes ago
  
  We got an overachiever, here. Kimi sounds like a teacher's pet kind of name.
game_the0ry an hour ago

There is some humor in the fact that china (of all countries) is pioneering possibly the world's most important tech via open source, while we (US) are doing the exact opposite.
[-]
- culi 16 minutes ago
  
  All great technological advancements have come through opening up technology. Just look at your iPhone. GPS, the internet, AI voice assistants, touchscreens, microprocessors, lithium-ion batteries, etc all came from gov't research (I'm counting Bell Labs' gov't mandated monopoly + research funding as gov't) that was opened up for free instead of being locked behind a patent.
  Private companies will never open up a technological breakthrough to their competitors. It just doesn't make sense. If you want an entire field to advance, you have to open it up.
- nashadelic 11 minutes ago
  
  additional humor is the open in openai
- osti 38 minutes ago
  
  Maybe open source == communism
  [-]
  - darkwater 31 minutes ago
    
    Good ol' Steve "Developers! Developers! Developers!" Ballmer said so a long time ago. What a visionary!
  - konart 11 minutes ago
    
    But China is not communist event though the rulling party the word in its name.
    
    [-]
    
    osti 2 minutes ago
    
    Oh i’m fully aware of that lol
elfbargpt an hour ago

I've always been surprised Kimi doesn't get more attention than it does. It's always stood out to me in terms of creativity, quality... has been my favorite model for awhile
[-]
- twotwotwo 6 minutes ago
  
  Kagi has it as an option in its Assistant thing, where there is naturally a lot of searching and summarizing results. I've liked its output there and in general when asked for prose that isn't in the list/Markdown-heavy "LLM style." It's hard to do a confident comparison, but it's seemed bold in arranging the output to flow well, even when that took surgery on the original doc(s). Sometimes the surgery's needed e.g. to connect related ideas the inputs treated as separate, or to ensure it really replies to the request instead of just dumping info that's somehow related to it.
- culi 23 minutes ago
  
  It's also one of the few models that seem capable of drawing an SVG clock
  https://clocks.brianmoore.com/
  [-]
  - SwellJoe a minute ago
    
    [delayed]
  - sigmoid10 5 minutes ago
    
    Is it? In your link it definitely failed to draw the clock.
- regularfry 41 minutes ago
  
  Dirt cheap on openrouter for how good it is, too. Really hoping that 2.6 carries on that tradition.
- varispeed 25 minutes ago
  
  Maybe because it's a bit of like unleashing a chaos monkey on your codebase? I tried it locally (K2.5 72B) and couldn't get anything useful.
  [-]
  - KaoruAoiShiho 24 minutes ago
    
    Huh, that's not a thing?
    
    [-]
    
    johndough 17 minutes ago
    
    The parent poster is probably referring to Kimi-Dev-72B¹, which is a much smaller and older model, while people are probably more familiar with the big and fairly powerful 1100B Kimi-K2.5².
    [1] https://huggingface.co/moonshotai/Kimi-Dev-72B
    [2] https://huggingface.co/moonshotai/Kimi-K2.5
    
    [-]
    
    natrys 2 minutes ago
    
    Yes it was good for its time, but 10 months old which is a long time ago in this space. It was also a fine-tune (albeit a good one) of Qwen-2.5 72B.
    I wish they did more smaller models. Kimi Linear doesn't really count, it was more of a proof of concept thing.
fintechie 13 minutes ago

Gonna give this one a go... the previous 2.5 model is used for Cursor's Composer 2 Fast. After real world tasks during a few weeks I have seen that it can be very dumb or it can be very good (better than Opus 4.7) depending on the problem you throw at it.
Sometimes in one single pass prompt/response can unblock you in issues where Opus ate $100+ in API credits and circled during hours. Other times the response is useless, but it is your responsibility as engineer to discern this.
Verdict (at least for me): use both.
nickandbro an hour ago

Wow, if the benchmarks checkout with the vibes, this could almost be like a Deepseek moment with Chinese AI now being neck and neck with SOTA US lab made models
[-]
- ai_fry_ur_brain 13 minutes ago
  
  Its not anywhere close, and if it was nobody in the USA would be spending 7 figures on infrastructure for it.
  You LLM people all here serious cases of Dunning Kruger
  [-]
  - otabdeveloper4 6 minutes ago
    
    > Its not anywhere close
    Close to what, and how are you measuring?
    > nobody in the USA would be spending 7 figures on infrastructure for it
    Au contraire, if AI had a moat it would pay for itself. They're funneling capital into infrastructure because they know it can't.
- motoboi an hour ago
  
  With the previous generation? Yes. With 10T mythos-level models? Not even close.
  [-]
  - amazingamazing an hour ago
    
    The psyop continues. Mythos until it’s released is vaporware. Notice how you can try kimi 2.6. Where is the same for mythos?
  - ChrisLTD an hour ago
    
    Mythos isn't the current generation, it's literally vaporware.
  - lbreakjai 12 minutes ago
    
    I've got a 12T model on my machine, built it myself. It's called Mytho. Too dangerous to even release a fact sheet about it. It can hack into the mainframe, enhance ultra-compressed images, grow your hair back, and make people fall in love with you.
  - bestouff an hour ago
    
    There's no public data about Mytho.
    
    [-]
    
    maplethorpe an hour ago
    
    That's because it would be too dangerous to release.
    
    [-]
    
    cedws an hour ago
    
    My girlfriend goes to a different school, you wouldn't know her.
    
    squarefoot 37 minutes ago
    
    Same for teleport, time travel and warp drive.
    
    nisegami an hour ago
    
    So is my P=NP proof.
  - jollymonATX an hour ago
    
    According to the benchmarks, you are wrong. It is on track and slightly above some sota. Just the benchmarks speaking there, they can be/are gamed by all big model labs including domestic.
  - irthomasthomas an hour ago
    
    10T? Impossible! They told us the training run was under 10^26 flops.
m4rkuskk 11 minutes ago

I have been testing it in my app all morning, and the results line up with 4.6 Sonnet. This is just a "vibe" feeling with no real testing. I'm glad we have some real competition to the "frontier" models.
lbreakjai an hour ago

I have a subscription through work, I've been trialing it, so far it looks on par, if not better, than opus.
dmix 14 minutes ago

I'm pretty Kimi is what Cursor uses for their "composer 2" model. Works pretty good as a fallback when Claude runs out, but definitely a downgrade.
cassianoleal 13 minutes ago

If only their API wasn't tied to a Google or phone login...
irthomasthomas an hour ago

Beats opus 4.6! They missed claiming the frontier by a few days.
[-]
- NitpickLawyer an hour ago
  
  While I'm skeptical of any "beats opus" claims (many were said, none turned out to be true), I still think it's insane that we can now run close-to-SotA models locally on ~100k worth of hardware, for a small team, and be 100% sure that the data stays local. Should be a no-brainer for teams that work in areas where privacy matters.
  [-]
  - cedws an hour ago
    
    Even the smaller quantized models which can run on consumer hardware pack in an almost unfathomable amount of knowledge. I don't think I expected to be able to run a 'local Google' in my lifetime before the LLM boom.
  - osti 35 minutes ago
    
    I think this one is only about 600GB VRAM usage, so it could fit on two mac studios with 512GB vram each. That would have costed (albeit no longer available) something like less than 20k.
    
    [-]
    
    NitpickLawyer 30 minutes ago
    
    Yeah, but that's personal use at best, not much agentic anything happening on that hardware. Macs are great for small models at small-medium context lengths, but at > 64k (something very common with agentic usage) it struggles and slows down a lot.
    The ~100k hardware is suitable for multi-user, small team usage. That's what you'd use for actual work in reasonable timeframes. For personal use, sure macs could work.
    
    zozbot234 9 minutes ago
    
    You could run it with SSD offload, earlier experiments with Kimi 2.5 on M5 hardware had it running at 2 tok/s. K2.6 has a similar amount of total and active parameters.
- BoorishBears an hour ago
  
  Opus is clearly a sidegrade meant to help Anthropic manage cost, so I would say they may have it if it actually beats 4.6
  [-]
  - irthomasthomas an hour ago
    
    Could be right. I just noticed my feed is absent the usual flood of posts demoing the new hotness on 3D modeling, game design and SVG drawings of animals on vehicles.
- pixel_popping 26 minutes ago
  
  It doesn't beat Opus 4.6, no way, don't be fooled by benchmarks.
mariopt 38 minutes ago

Really excited to try this one, I've been using kimi 2.5 for design and it's really good but borderline useless on backend/advanced tasks.
Also discovered that using OpenCode instead of the kimi cli, really hurts the model performance (2.5).
Banditoz 44 minutes ago

If the benchmarks are private, how do we reproduce the results? I looked up the Humanity's Last Exam (https://agi.safe.ai/) this model uses and I can't seem to access it.
pt9567 an hour ago

wow - $0.95 input/$4 output. If its anywhere near opus 4.6 that's incredible.
[-]
- corlinp 42 minutes ago
  
  This should erase any doubt that AI Labs are making $$$ on API inference.
  Kimi 2.5 (which this is based on) is served at $0.44 input / $2 output by a ton of different providers on OpenRouter, 2.6 will certainly be similar.
  That's about 11X less than Opus for similar smarts.
  [-]
  - Lalabadie 33 minutes ago
    
    Famously, OpenAI and Anthropic are devoted to increasing efficiency before scaling up resource usage.
  - amazingamazing 29 minutes ago
    
    How does it erase any doubt? You’re implying Chinese things can’t be actually cheaper to produce than American which is laughable
swingboy an hour ago

Exciting benchmarks if true. What kind of hardware do they typically run these benchmarks on? Apologies if my terminology is off, but I assume they're using an unquantized version that wouldn't run on even the beefiest MacBook?
cmrdporcupine 12 minutes ago

Running it through opencode to their API and... it definitely seems like it's "overthinking" -- watching the thought process, it's been going for pages and pages and pages diagnosing and "thinking" things through... without doing anything. Sitting at 50k+ output tokens used now just going in thought circles, complete analysis paralysis.
Might be a configuration or prompt issue. I guess I'll wait and see, but I can't get use out of this now.
verdverm an hour ago

https://huggingface.co/moonshotai/Kimi-K2.6
Is this the same model?
Unsloth quants: https://huggingface.co/unsloth/Kimi-K2.6-GGUF
(work in progress, no gguf files yet, header message saying as much)
[-]
- Balinares 35 minutes ago
  
  Quite curious how well real usage will back the benchmarks, because even if it's only Opus ballpark, open weights Opus ballpark is seismic.
greenavocado an hour ago

I pray the benchmark figures are true so I can stop paying Anthropic after screwing me over this quarter by dumbing down their models, making usage quotas ridiculously small, and demanding KYC paperwork.
[-]
- jollymonATX an hour ago
  
  Anthropic has done horrible PR and investors should be livid.
  [-]
  - greenavocado an hour ago
    
    My theory is they pushed retail off their systems to make room for their new corporate fat cat clients. In which case, they'll do just fine.
oliver236 38 minutes ago

isnt this better than qwen?
nisegami an hour ago

The choice of example task for Long-Horizon Coding is a bit spooky if you squint, since it's nearing the territory of LLMs improving themselves.
esafak an hour ago

K2.5 was already pretty decent so I would try this. Starting at $15/month: https://www.kimi.com/membership/pricing
edit: Note that you can run it yourself with sufficient resources, or access it from other providers too: https://openrouter.ai/moonshotai/kimi-k2.6/providers
[-]
- pbowyer 27 minutes ago
  
  What's the privacy/data security like? I can't find that on that page.
  Edit: found it.
  > We may use your Content to operate, maintain, improve, and develop the Services, to comply with legal obligations, to enforce our policies, and to ensure security. You may opt out of allowing your Content to be used for model improvement and research purposes by contacting us at membership@moonshot.ai. We will honor your choice in accordance with applicable law.
  Section 3 of https://www.kimi.com/user/agreement/modelUse?version=v2
  [-]
  - pixel_popping 18 minutes ago
    
    You really rely on ToS from Anthropic/OpenAI to know if they use your prompts or not? It's on their servers, why wouldn't they use our data?
- wg0 an hour ago
  
  How are the usage limits compared to Anthropic?
  [-]
  - greenavocado 42 minutes ago
    
    Anthropic has the worst usage limits in the industry