Not "hidden", but probably more like "no one bothered to look".
declares a 1024-byte owner ID, which is an unusually long but legal value for the owner ID.
When I'm designing protocols or writing code with variable-length elements, "what is the valid range of lengths?" is always at the front of my mind.
it uses a memory buffer that’s only 112 bytes. The denial message includes the owner ID, which can be up to 1024 bytes, bringing the total size of the message to 1056 bytes. The kernel writes 1056 bytes into a 112-byte buffer
This is something a lot of static analysers can easily find. Of course asking an LLM to "inspect all fixed-size buffers" may give you a bunch of hallucinations too, but could be a good starting point for further inspection.
> This is something a lot of static analysers can easily find.
And yet they didn't (either noone ran them, or they didn't find it, or they did find it but it was buried in hundreds of false positives) for 20+ years...
I find it funny that every time someone does something cool with LLMs, there's a bunch of takes like this: it was trivial, it's just not important, my dad could have done that in his sleep.
Remember Heartbleed in OpenSSL? That long predated LLMs, but same story: some bozo forgot how long something should/could be, and no one else bothered to check either.
https://www.youtube.com/watch?v=1sd26pWhfmg is the presentation itself. The prompts are trivial; the bug (and others) looks real and well-explained - I'm still skeptical but this looks a lot more real/useful than anything a year ago even suggested was possible...
Tokens are insanely cheap at the moment.
Through OpenRouter a message to Sonnet costs about $0.001 cents or using Devstral 2512 it's about $0.0001.
An extended coding session/feature expansion will cost me about $5 in credits.
Split up your codebase so you don't have to feed all of it into the LLM at once and it's a very reasonable.
It cost me ~$750 to find a tricky privilege escalation bug in a complex codebase where I knew the rough specs but didn't have the exploit. There are certainly still many other bugs like that in the codebase, and it would cost $100k-$1MM to explore the rest of the system that deeply with models at or above the capability of Opus 4.6.
It's definitely possible to do a basic pass for much less (I do this with autopen.dev), but it is still very expensive to exhaustively find the harder vulnerabilities.
That might be a problem for the labs (although I don't think it is) but it's not a problem for end-users. There is enough pressure from top labs competing with each other, and even more pressure from open models that should keep prices at a reasonable price point going further.
In order to justify higher prices the SotA needs to have way higher capabilities than the competition (hence justifying the price) and at the same time the competition needs to be way below a certain threshold. Once that threshold becomes "good enough for task x", the higher price doesn't make sense anymore.
While there is some provider retention today, it will be harder to have once everyone offers kinda sorta the same capabilities. Changing an API provider might even be transparent for most users and they wouldn't care.
If you want to have an idea about token prices today you can check the median for serving open models on openrouter or similar platforms. You'll get a "napkin math" estimate for what it costs to serve a model of a certain size today. As long as models don't go oom higher than today's largest models, API pricing seems in line with a modest profit (so it shouldn't be subsidised, and it should drop with tech progress). Another benefit for open models is that once they're released, that capability remains there. The models can't get "worse".
Not really. I'm fully taking advantage of these low prices while they last. Eventually the AI companies will run start running out of funny money and start charging what the models actually cost to run, then I just switch over to using the self hosted models more often and utilize the online ones for the projects that need the extra resources.
Currently there's no reason for why I shouldn't use Claude Sonnet to write one time bash scripts, once it starts costing me a dollar to do so I'm going to change my behavior.
> Currently there's no reason for why I shouldn't use Claude Sonnet to write one time bash scripts, once it starts costing me a dollar to do so I'm going to change my behavior.
This just isn't going to happen, we have open weights models which we can roughly calculate how much they cost to run that are on the level of Sonnet _right now_. The best open weights models used to be 2 generations behind, then they were 1 generation behind, now they're on par with the mid-tier frontier models. You can choose among many different Kimi K2.5 providers. If you believe that every single one of those is running at 50% subsidies, be my guest.
I also have this feeling. But do you ever doubt it. that when the time comes we will be like the boiled frog? Where its "just so convenient" or that the reality of setting up a local ai is just a worse experience for a large upfront cost?
Inference cost has dropped 300x in 3 years, no reason to think this won't keep happening with improvements on models, agent architecture and hardware.
Also, too many people are fixated with American models when Chinese ones deliver similar quality often at fraction of a cost.
From my tests, "personality" of an LLM, it's tendency to stick to prompts and not derail far outweights the low % digit of delta in benchmark performance.
Not to mention, different LLMs perform better at different tasks, and they are all particularly sensible to prompts and instructions.
Code review is the real deal for these models. This area seems largely underappreciated to me. Especially for things like C++, where static analysis tools have traditionally generated too many false positives to be useful, the LLMs seem especially good. I'm no black hat but have found similarly old bugs at my own place. Even if shit is hallucinated half the time, it still pays off when it finds that really nasty bug.
Instead, people seem to be infatuated with vibe coding technical debt at scale.
Not "hidden", but probably more like "no one bothered to look".
declares a 1024-byte owner ID, which is an unusually long but legal value for the owner ID.
When I'm designing protocols or writing code with variable-length elements, "what is the valid range of lengths?" is always at the front of my mind.
it uses a memory buffer that’s only 112 bytes. The denial message includes the owner ID, which can be up to 1024 bytes, bringing the total size of the message to 1056 bytes. The kernel writes 1056 bytes into a 112-byte buffer
This is something a lot of static analysers can easily find. Of course asking an LLM to "inspect all fixed-size buffers" may give you a bunch of hallucinations too, but could be a good starting point for further inspection.
> This is something a lot of static analysers can easily find.
And yet they didn't (either noone ran them, or they didn't find it, or they did find it but it was buried in hundreds of false positives) for 20+ years...
I find it funny that every time someone does something cool with LLMs, there's a bunch of takes like this: it was trivial, it's just not important, my dad could have done that in his sleep.
Remember Heartbleed in OpenSSL? That long predated LLMs, but same story: some bozo forgot how long something should/could be, and no one else bothered to check either.
> "given enough eyeballs, all bugs are shallow"
Time to update that:
"given 1 million tokens context window, all bugs are shallow"
An explanation of the Claude Opus 4.6 linux kernel security findings as presented by Nicholas Carlini at unpromptedcon.
https://www.youtube.com/watch?v=1sd26pWhfmg is the presentation itself. The prompts are trivial; the bug (and others) looks real and well-explained - I'm still skeptical but this looks a lot more real/useful than anything a year ago even suggested was possible...
This does sound great, but the cost of tokens will prevent most companies from using agents to secure their code.
Tokens aren't more expensive than highly trained meatbags today. There's no way they'll be more expensive "tomorrow"...
Tokens are insanely cheap at the moment. Through OpenRouter a message to Sonnet costs about $0.001 cents or using Devstral 2512 it's about $0.0001. An extended coding session/feature expansion will cost me about $5 in credits. Split up your codebase so you don't have to feed all of it into the LLM at once and it's a very reasonable.
It cost me ~$750 to find a tricky privilege escalation bug in a complex codebase where I knew the rough specs but didn't have the exploit. There are certainly still many other bugs like that in the codebase, and it would cost $100k-$1MM to explore the rest of the system that deeply with models at or above the capability of Opus 4.6.
It's definitely possible to do a basic pass for much less (I do this with autopen.dev), but it is still very expensive to exhaustively find the harder vulnerabilities.
You’d have to ignore the massive investor ROI expectations or somehow have no capability to look past “at the moment”.
That might be a problem for the labs (although I don't think it is) but it's not a problem for end-users. There is enough pressure from top labs competing with each other, and even more pressure from open models that should keep prices at a reasonable price point going further.
In order to justify higher prices the SotA needs to have way higher capabilities than the competition (hence justifying the price) and at the same time the competition needs to be way below a certain threshold. Once that threshold becomes "good enough for task x", the higher price doesn't make sense anymore.
While there is some provider retention today, it will be harder to have once everyone offers kinda sorta the same capabilities. Changing an API provider might even be transparent for most users and they wouldn't care.
If you want to have an idea about token prices today you can check the median for serving open models on openrouter or similar platforms. You'll get a "napkin math" estimate for what it costs to serve a model of a certain size today. As long as models don't go oom higher than today's largest models, API pricing seems in line with a modest profit (so it shouldn't be subsidised, and it should drop with tech progress). Another benefit for open models is that once they're released, that capability remains there. The models can't get "worse".
Not really. I'm fully taking advantage of these low prices while they last. Eventually the AI companies will run start running out of funny money and start charging what the models actually cost to run, then I just switch over to using the self hosted models more often and utilize the online ones for the projects that need the extra resources. Currently there's no reason for why I shouldn't use Claude Sonnet to write one time bash scripts, once it starts costing me a dollar to do so I'm going to change my behavior.
> Currently there's no reason for why I shouldn't use Claude Sonnet to write one time bash scripts, once it starts costing me a dollar to do so I'm going to change my behavior.
This just isn't going to happen, we have open weights models which we can roughly calculate how much they cost to run that are on the level of Sonnet _right now_. The best open weights models used to be 2 generations behind, then they were 1 generation behind, now they're on par with the mid-tier frontier models. You can choose among many different Kimi K2.5 providers. If you believe that every single one of those is running at 50% subsidies, be my guest.
I also have this feeling. But do you ever doubt it. that when the time comes we will be like the boiled frog? Where its "just so convenient" or that the reality of setting up a local ai is just a worse experience for a large upfront cost?
worse. he's already boiled. probably paying way more than that one dollar per bash script with all the subscriptions he already has.
Yeah, the $20 I paid to OpenRouter about 4 months ago really cost me an arm and a leg, not sure where I'll get my next meal if I'm to be honest.
>$0.001 cents
$0.001 (1/10 of a cent) or 0.001 cents (1/1000 of a cent, or $0.00001)?
I don't buy it.
Inference cost has dropped 300x in 3 years, no reason to think this won't keep happening with improvements on models, agent architecture and hardware.
Also, too many people are fixated with American models when Chinese ones deliver similar quality often at fraction of a cost.
From my tests, "personality" of an LLM, it's tendency to stick to prompts and not derail far outweights the low % digit of delta in benchmark performance.
Not to mention, different LLMs perform better at different tasks, and they are all particularly sensible to prompts and instructions.
But on the other hand, Claude might introduce more vulnerability than it discovered.
Code review is the real deal for these models. This area seems largely underappreciated to me. Especially for things like C++, where static analysis tools have traditionally generated too many false positives to be useful, the LLMs seem especially good. I'm no black hat but have found similarly old bugs at my own place. Even if shit is hallucinated half the time, it still pays off when it finds that really nasty bug.
Instead, people seem to be infatuated with vibe coding technical debt at scale.