A lazy easy cheap shot. But do you deny these aspects from the article are not coming? Or won't be still here in 5 years?
- Addition of more—and faster—memory.
- Consolidation of memory.
- Combination of chips on the same silicon.
All of these are also happening for non AI reasons. The move to SoC that really started with the M1 wasn't because of AI, but unified memory being the default is something we will see in 5 years. Unlike 3D TV.
We just had a series of articles and sysadmin outcry that major vendors were bringing 8gb laptops back to standard models because of the ram prices. In the short term, we're seeing a reduction.
- People wanting more memory is not a novel feature. I am excited to find out how many people immediately want to disable the AI nonsense to free up memory for things they actually want to do.
- Same answer.
- I think the drive towards SOCs has been happening already. Apple's M-series utterly demolishes every PC chip apart from the absolute bleeding-edge available, includes dedicated memory and processors for ML tasks, and it's mature technology. Been there for years. To the extent PC makers are chasing this, I would say it's far more in response to that than anything to do with AI.
That happened before AI. I hate this maxing in the hardware sector. People with 16 cores and 14 of them are just idling away for the life time of the device. It's insanity the same with gamers and 20 fans in their PCs. Everyone with a fucking 300 bucks podcast mic or a mirror less camera as webcam and the lighting setup of a small town photostudio for some dumbass team jour fix every week. Devs with 4 mechanical keyboards in their drawer while sleeping on a 40 dollar mattress on the floor. Insanity.
I was in the market for a laptop this month. Many new laptops now advertise AI features like this "HP OmniBook 5 Next Gen AI PC" which advertises:
"SNAPDRAGON X PLUS PROCESSOR - Achieve more everyday with responsive performance for seamless multitasking with AI tools that enhance productivity and connectivity while providing long battery life"
I don't want this garbage on my laptop, especially when its running of its battery! Running AI on your laptop is like playing Starcraft Remastered on the Xbox or Factorio on your steamdeck. I hear you can play DOOM on a pregnancy test too. Sure, you can, but its just going to be a tedious inferior experiance.
Really, this is just a fine example of how overhyped AI is right now.
Laptop manufacturers are too desperate to cash on the AI craze. There's nothing special about an 'AI PC'. It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.
>I don't want this garbage on my laptop, especially when its running of its battery!
The one bit of good news is it's not going to impact your battery life because it doesn't do any on-device processing. It's just calling an LLM in the cloud.
That's not quite correct. Snapdragon chips that are advertised as being good for "AI" also come with the Hexagon DSP, which is now used for (or targeted at) AI applications. It's essentially a separate vector processor with large vector sizes.
There's nothing special with what Intel has lowered the bar as an AI PC so vendors can market it. Ollama can run a 4b model plenty fine on Tiger Lake with 8gb classic RAM.
But unified memory IS truly what makes an AI ready PC. The Apple Silicon proves that. People are willing to pay the premium, and I suspect unified memory will still be around and bringing us benefits even if no one cares about LLMs in 5 years.
> It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.
"AI PC" branded devices get "Copilot+" and additional crap that comes with that due to the NPU. Despite desktops having GPUs with up to 50x more TOPs than the requirement, they don't get all that for some reason https://www.thurrott.com/mobile/copilot-pc/323616/microsoft-...
Doesn't this lead to a lot of tension between the hardware makers and Microsoft?
MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.
Laptop manufacturers are making laptops that can run an LLM locally, but there's no point in that unless there's a local LLM to run (and Windows won't have that because Copilot). Are they going to be pre-installing Llama on new laptops?
Are we going to see a new power user / normal user split? Where power users buy laptops with LLMs installed, that can run them, and normal folks buy something that can call Copilot?
It isn't just copilot that these laptops come with; manufacturers are already putting their own AI chat apps as well.
For example, the LG gram I recently got came with just such an app named Chat, though the "ai button" on the keyboard (really just right alt or control, I forget which) defaults to copilot.
If there's any tension at all, it's just who gets to be the default app for the "ai button" on the keyboard that I assume almost nobody actually uses.
> MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.
MS doesn't care where your data is, they're happy to go digging through your C drive to collect/mine whatever they want, assuming you can avoid all the dark patterns they use to push you to save everything on OneDrive anyway and they'll record all your interactions with any other AI using Recall
It's just marketing. The laptop makers will market it as if your laptop power makes a difference knowing full well that it's offloaded to the cloud.
For a slightly more charitable perspective, agentic AI means that there is still a bunch of stuff happening on the local machine, it's just not the inference itself.
Even collecting and sending all that data to the cloud is going to drain battery life. I'd really rather my devices only do what I ask them to than have AI running the background all the time trying to be helpful or just silently collecting data.
Windows is going more and more into AI and embedding it into the core of the OS as much as it can. It’s not “an app”, even if that was true now it wouldn't be true for very long. The strategy is well communicated.
Unfortunately still loads of hurdles for most people.
AAA Games with anti-cheat that don't support Linux.
Video editing (DaVinci Resolve exists but is a pain to get up and running on many distros, KDenLive/OpenShot don't really cut it for most)
Adobe Suite (Photoshop/Lightroom specifically, and Premiere for Video Editing) - would like to see Affinity support Linux but hasn't happened so far. GIMP and DarkTable aren't really substitutions unless you pour a lot of time into them.
Tried moving to Linux on my laptop this past month, made it a month before a reinstall of Windows 11. Had issues with WiFi chip (managed to fix but had to edit config files deep in the system, not ideal), Fedora with LUKS encryption after a kernel update the keyboard wouldn't work to input the encryption key, no Windows Hello-like support (face ID). Had the most success with EndeavourOS but running Arch is a chore for most.
It's getting there, best it's ever been, but there's still hurdles.
According to my friends, Arc Raders works well on linux. So it's very much, just a small selection of AAA games, so they can run anti-cheat, that probably doesn't even work. Can you name a triple a you want to play, that proton says is incompatible?
Gimp isn't a solution, sure but it works for what I need. Darktable does way more than I've ever wanted, so I can forgive it for the one time it crashed. Inkscape and blender both exceed my needs as well.
And Adobe is so user hostile, that I feel I need to call you a mean name to prove how I feel.... dummy!
Yes, I already feel bad, and I'm sorry. But trolling aside, listing applications that treat users like shit, aren't reasons to stay on the platform that also treats you like shit.
I get it, sometimes, being treated like shit is worth it because it's easier now that you're used to being disrespected. But an aversion to the effort it'd take for you to climb the learning curve of something different, isn't valid reason to help the disrespectful trash companies making the world worse, recruit more people for them to treat like trash.
Just because you use it, doesn't make it worth recommending.
I don't really PC game anymore, use my Xbox or a few older games my laptop's iGPU can handle, not at the moment anyway. Battlefield 6 is a big one recently that if I had a gaming PC set-up I'd probably want to play.
I know Adobe are... c-words, but their software is industry standard for a reason.
The thing is nowhere near the performance as a macbook, but its silent and the battery lasts ages, which is a far cry from the same laptop with an Intel CPU, which is what many are running.
Was never really into Apple hardware (mainly the price), however I recently got an M1 Mac Mini and an iPhone for app development, and the inference speed for as you say, a 5 year old chip is actually crazy.
If they made the M series fully open for Linux (I know Asahi is working away) I probably would never buy another non-M series processor again.
I got an M1 Mac Mini somewhat recently as well, to replace my ~2012 Mac Mini that I use as a media center PC. And frankly, it's overkill. Used ones can be had for $200-$300 USD, lower side with cosmetic damage. An absolute steal, IMO.
You can still get an M1 Macbook Air at retail for $599 ($300 for refurbs), which is a Chromebook price for a laptop that is better in pretty much every respect than any Chromebook.
I think only a small percentage of users care that much about running LLMs locally to pay for extra hardware for it, put up with slower and lower-quality responses, etc. . It’ll never be as good as non-local offerings, and is more hassle.
With the wild ram prices, which btw are probably going to last out 2026, I expect 8 GB ram to be the new standard going on forward.
32 GB ram will be for enthusiasts with deep pockets, and professionals. Anything over that, exclusively professionals.
The conspiracy theorist inside me is telling me that big AI companies like OpenAI would rather see that people are using their puny laptops as terminals / shells only, to reach sky-based models, than to let them have beefy laptops and local models.
I feel like there's no point to get a graphics card nowadays. Clearly, graphics cards are optimized for graphics; they just happened to be good for AI but based on the increased significance of AI, I'd be surprised if we don't get more specialized chips and specialized machines just for LLMs. One for LLMs, a different one for stable diffusion.
With graphics processing, you need a lot of bandwidth to get stuff in and out of the graphics card for rendering on a high-resolution screen, lots of pixels, lots of refreshes, lots of bandwidth... With LLMs, a relatively small amount of text goes in and a relatively small amount of text comes out over a reasonably long amount of time. The amount of internal processing is huge relative to the size of input and output. I think NVIDIA and a few other companies already started going down that route.
But probably graphics cards will still be useful for stable diffusion; especially AI-generated videos as the inputs and output bandwidth is much higher.
First, GPGPU is powerful and flexible. You can make an "AI-specific accelerator", but it wouldn't be much simpler or much more power-efficient - while being a lot less flexible. And since you need to run traditional graphics and AI workloads both in consumer hardware? It makes sense to run both on the same hardware.
And bandwidth? GPUs are notorious for not being bandwidth starved. 4K@60FPS seems like a lot of data to push in or out, but it's nothing compared to how fast modern PCIe 5.0 x16 goes. AI accelerators are more of the same.
GPUs might not be bandwidth starved most of the time, but they absolutely are when generating text from an llm.
It’s the whole reason why low precision floating point numbers are being pushed by nvidia.
LLMs are enormously bandwidth hungry. You have to shuffle your 800GB neural network in and out of memory for every token, which can take more time/energy than actually doing the matrix multiplies. GPUs are almost not high bandwidth enough.
But even so, for a single user, the output rate for a very fast LLM would be like 100 tokens per second. With graphics, we're talking like 2 million pixels, 60 times a second; 120 million pixels per second for a standard high res screen. Big difference between 100 tokens vs 120 million pixels.
24 bit pixels gives 16 million possible colors... For tokens, it's probably enough to represent every word of the entire vocabulary of every major national language on earth combined.
> You have to shuffle your 800GB neural network in and out of memory
Do you really though? That seems more like a constraint imposed by graphics cards. A specialized AI chip would just keep the weights and all parameters in memory/hardware right where they are and update them in-situ. It seems a lot more efficient.
I think that it's because graphics cards have such high bandwidth that people decided to use this approach but it seems suboptimal.
But if we want to be optimal; then ideally, only the inputs and outputs would need to move in and out of the chip. This shuffling should be seen as an inefficiency; a tradeoff to get a certain kind of flexibility in the software stack... But you waste a huge amount of CPU cycles moving data between RAM, CPU cache and Graphics card memory.
This doesn't seem right. Where is it shuffling to and from? My drives aren't fast enough to load the model every token that fast, and I don't have enough system memory to unload models to.
If you're using a MoE model like DeepSeek V3 the full model is 671 GB but only 37 GB are active per token, so it's more like running a 37 GB model from the memory bandwidth perspective. If you do a quant of that it could e.g. be more like 18 GB.
There won't be a single time you can observe yourself carrying the weight of everything being moved out of the house because that's not what's happening. Instead you can observe yourself taking many tiny loads until everything is finally moved, at which point you yourself should not be loaded as a result of carrying things from the house anymore (but you may be loaded for whatever else you're doing).
Viewing active memory bandwidth can be more complicated than it'd seem to set up, so the easier way is to just view your VRAM usage as you load in the model freshly into the card. The "nvtop" utility can do this for most any GPU on Linux, as well as other stats you might care about as you watch LLMs run.
I don't doubt that there will be specialized chips that make AI easier, but they'll be more expensive than the graphics cards sold to consumers which means that a lot of companies will just go with graphics cards, either because the extra speed of specialized chips won't be worth the cost, or will they'll be flat out too expensive and priced for the small number of massive spenders who'll shell out insane amounts of money for any/every advantage (whatever they think that means) they can get over everyone else.
Video games have driven the need for hardware more than office work. Sadly games are already being scaled back and more time is being spent on optimization instead of content since consumers can't be expected to have the kind of RAM available they normally would and everyone will be forced to make do with whatever RAM they have for a long time.
That might not be the case. The kind of memory that will flood the second-hand market could not be the kind of memory we can stuff in laptops or even desktop systems.
I was mussing this summer if I should get a refurbed Thinkpad P16 with 96GB of RAM to run VMs purely in memory. Now that 96GB of ram cost as much as a second P16.
I feel you, so much. I was thinking of getting a second 64gb node for my homelab and i thought i’d save those money… now the ram alone cost as much as the node, and I’m crying.
Lesson learned: you should always listen to that voice inside your head that say: “but i need it…” lol
I rebuilt a workstation after a failed motherboard a year ago. I was not very excited about being forced to replace it on a days notice and cheaped out on the RAM (only got 32GB). This is like the third or fourth time I've taught myself the lesson to not pinch pennies when buying equipment/infrastructure assets. It's the second time the lesson was about RAM, so clearly I'm a slow learner.
By "we" do you mean consumers? No, "we" will get neither. This is unexpected, irresistable opportunity to create a new class, by controlling the technology that people are required and are desiring to use (large genAI) with a comprehensive moat — financial, legislative and technological. Why make affordable devices that enable at least partial autonomy? Of course the focus will be on better remote operation (networking, on-device secure computation, advancing narrative that equates local computation with extremism and sociopathy).
I'm running GPT-OSS 120B on a MacBook Pro M3 Max w/128 GB. It is pretty good, not great, but better than nothing when the wifi on the plane basically doesn't work.
This article is so dumb. It totally ignores the memory price explosion that will make large fast memory laptops unfeasible for years and states stuff like this:
> How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly. It’s not possible to run these models on today’s consumer hardware, so real-world tests just can’t be done.
We know exactly the performance needed for a given responsiveness. TOPS is just a measurement independent from the type of hardware it runs on..
The less TOPS the slower the model runs so the user experience suffers. Memory bandwidth and latency plays a huge role too. And context, increase context and the LLM becomes much slower.
We don't need to wait for consumer hardware until we know much much is needed. We can calculate that for given situations.
It also pretends small models are not useful at all.
I think the massive cloud investments will put pressure away from local AI unfortunately. That trend makes local memory expensive and all those cloud billions have to be made back so all the vendors are pushing for their cloud subscriptions. I'm sure some functions will be local but the brunt of it will be cloud, sadly.
You don't understand the needs of a common laptop user. Define the usecases that require reaching out to laptop instead of using the phone that is nearby. Those usecases don't need LLM for a common laptop user.
I’ve been running LLMs on my laptop (M3 Max 64GB) for a year now and I think they are ready, especially with how good mid sized models are getting. I’m pretty sure unified memory and energy efficient GPUs will be more than just a thing on Apple laptops in the next few years.
Memory prices will rise short term and generally fall long term, even with the current supply hiccup the answer is to just build out more capacity (which will happen if there is healthy competition). I meant, I expect the other mobile chip providers to adopt unified architecture and beefy GPU cores on chip and lots of bandwidth to connect it to memory (at the max or ultra level, at least), I think AMD is already doing UM at least?
> Memory prices will rise short term and generally fall long term, even with the current supply hiccup the answer is to just build out more capacity (which will happen if there is healthy competition)
Don't worry! Sam Altman is on it. Making sure there never is healthy competition that is.
The problem with this is that NPU have terrible, terrible support in the various software ecosystems because they are unique to their particular soc or whatever. No consistency even within particular companies.
> How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly.
Why not extrapolate from open-source AIs which are available? The most powerful open-source AI (which I know of) is Kimi K2 and >600gb. Running this at acceptable speed requires 600+gb GPU/NPU memory. Even $2000-3000 AI-focused PCs like the DGX spark or Strix Halo typically top out at 128gb. Frontier models will only run on something that costs many times a typical consumer PC, and only going to get worse with RAM pricing.
In 2010 the typical consumer PC had 2-4gb of RAM. Now the typical PC has 12-16gb. This suggests RAM size doubling perhaps every 5 years at best. If that's the case, we're 25-30 years away from the typical PC having enough RAM to run Kimi K2.
But the typical user will never need that much RAM for basic web browsing, etc. The typical computer RAM size is not going to keep growing indefinitely.
What about cheaper models? It may be possible to run a "good enough" model on consumer hardware eventually. But I suspect that for at least 10-15 years, typical consumers (HN readers may not be typical!) will prefer capability, cheapness, and especially reliability (not making mistakes) over being able to run the model locally. (Yes AI datacenters are being subsidized by investors; but they will remain cheaper, even if that ends, due to economies of scale.)
The economics dictate that AI PCs are going to remain a niche product, similar to gaming PCs. Useful AI capability is just too expensive to add to every PC by default. It's like saying flying is so important, everyone should own an airplane. For at least a decade, likely two, it's just not cost-effective.
> It may be possible to run a "good enough" model on consumer hardware eventually
10-15 years?!!!! What is the definition of good enough? Qwen3 8B or A30B are quite capable models which run on a lot of hardware even today. SOTA is not just getting bigger, it's also getting more intelligence and running it more efficiently. There have been massive gains in intelligence at the smaller model sizes. It is just highly task dependent. Arguably some of these models are "good enough" already, and the level of intelligence and instruction following is much better from even 1 year ago. Sure not Opus 4.5 level, but still much could be done without that level of intelligence.
"Good enough" has to mean users won't be frequently frustrated if they transition to it from a frontier model.
> it is highly task dependent... much could be done without that level of intelligence
This is an enthusiast's glass-half-full perspective, but casual end users are gonna have a glass-half-empty perspective. Quen3-8B is impressive, but how many people use it as a daily driver? Most casual users will toss it as soon as it screws up once or twice.
The phrase you quoted in particular was imprecise (sorry) but my argument as a whole still stands. Replace "consumer hardware" with "typical PCs" - think $500 bestseller laptops from Walmart. AI PCs will remain niche luxury products, like gaming PCs. But gaming PCs benefit from being part of gaming culture and because cloud gaming adds input latency. Neither of these affects AI much.
You may be correct, but I wonder if we'll see Mac Mini sized external AI boxes that do have the 1TB of RAM and other hardware for running local models.
Maybe 100% of computer users wouldn't have one, but maybe 10-20% of power users would, including programmers who want to keep their personal code out of the training set, and so on.
I would not be surprised though if some consumer application made it desirable for each individual, or each family, to have local AI compute.
It's interesting to note that everyone owns their own computer, even though a personal computer sits idle half the day, and many personal computers hardly ever run at 80% of their CPU capacity. So the inefficiency of owning a personal AI server may not be as much of a barrier as it would seem.
> In 2010 the typical consumer PC had 2-4gb of RAM. Now the typical PC has 12-16gb. This suggests RAM size doubling perhaps every 5 years at best. If that's the case, we're 25-30 years away from the typical PC having enough RAM to run Kimi K2.
Part of the reason that RAM isn't growing faster is that there's no need for that much RAM at the moment. Technically you can put multiple TB of RAM in your machine, but no-one does that because it's a complete waste of money [0]. Unless you're working in a specialist field 16Gb of RAM is enough, and adding more doesn't make anything noticeably faster.
But given a decent use-case, like running an LLM locally, and you'd find demand for lots more RAM, and that would drive supply, and new technology developments, and in ten years it'll be normal to have 128TB of RAM in a baseline laptop.
Of course, that does require that there is a decent use-case for running an LLM locally, and your point that that is not necessarily true is well-made. I guess we'll find out.
[0] apart from a friend of mine working on crypto who had a desktop Linux box with 4TB of RAM in it.
I mean, having a more powerful laptop is great, but at the same time, these guys are calling for a >10x increase in RAM and a far more powerful NPU. How will this affect pricing? How will it affect power management? It made it seem like most of the laptop will be dedicated to gen AI services, which I'm still not entirely convinced are quite THAT useful. I still want a cheap laptop that lasts all day and I also want to be able to tap that device's full power for heavy compute jobs!
The point is that when you run it on your own hardware you can feed the model your health data, bank statements and private journals and can be 5000% sure they’re not going anywhere
I've been playing around with my own home-built AI server for a couple months now. It is so much better than using a cloud provider. It is the difference between drag racing in your own car, and renting one from a dealership. You are going to learn far more doing things yourself. Your tools will be much more consistent and you will walk away with a far greater understanding of every process.
A basic last-generation PC with something like a 3060ti (12GB) is more than enough to get started. My current rig pulls less than 500w with two cards (3060+5060). And, given the current temperature outside, the rig helps heat my home. So I am not contributing to global warming, water consumption, or any other datacenter-related environmental evil.
The author seems unaware of how well recent Apple laptops run LLMs. This is puzzling and puts into question the validity of anything in this article.
I think the author is aware of Apple silicon. The article mentions the fact Apple has unified memory and that this is advantageous for running LLMs.
"How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly."
What's he talking about? It's trivial to calculate that.
Isn't the ability to run it more dependant on (V)RAM? With TOPS just dictating the speed at which it runs?
It's also been done before...[0]
[0]: https://www.edge-ai-vision.com/2024/05/2024-edge-ai-and-visi...
See: "3D TVs are driving the biggest change in TVs in decades"
A lazy easy cheap shot. But do you deny these aspects from the article are not coming? Or won't be still here in 5 years?
- Addition of more—and faster—memory.
- Consolidation of memory.
- Combination of chips on the same silicon.
All of these are also happening for non AI reasons. The move to SoC that really started with the M1 wasn't because of AI, but unified memory being the default is something we will see in 5 years. Unlike 3D TV.
We just had a series of articles and sysadmin outcry that major vendors were bringing 8gb laptops back to standard models because of the ram prices. In the short term, we're seeing a reduction.
Memory is absolutely not coming in the near future. Nobody can afford it.
> Addition of more—and faster—memory.
probably not after scam altman bought up half the world's supply for his shit company
> The move to SoC that really started with the M1
No it did not. There were numerous SoC that came before it and was inevitable in this space.
In order:
- People wanting more memory is not a novel feature. I am excited to find out how many people immediately want to disable the AI nonsense to free up memory for things they actually want to do.
- Same answer.
- I think the drive towards SOCs has been happening already. Apple's M-series utterly demolishes every PC chip apart from the absolute bleeding-edge available, includes dedicated memory and processors for ML tasks, and it's mature technology. Been there for years. To the extent PC makers are chasing this, I would say it's far more in response to that than anything to do with AI.
That happened before AI. I hate this maxing in the hardware sector. People with 16 cores and 14 of them are just idling away for the life time of the device. It's insanity the same with gamers and 20 fans in their PCs. Everyone with a fucking 300 bucks podcast mic or a mirror less camera as webcam and the lighting setup of a small town photostudio for some dumbass team jour fix every week. Devs with 4 mechanical keyboards in their drawer while sleeping on a 40 dollar mattress on the floor. Insanity.
This article is just saying more laptops will have power efficient GPUs in it. A bit better than 3D TVs.
They might not use Apple silicon often. Other options are encouraging.
I was in the market for a laptop this month. Many new laptops now advertise AI features like this "HP OmniBook 5 Next Gen AI PC" which advertises:
"SNAPDRAGON X PLUS PROCESSOR - Achieve more everyday with responsive performance for seamless multitasking with AI tools that enhance productivity and connectivity while providing long battery life"
I don't want this garbage on my laptop, especially when its running of its battery! Running AI on your laptop is like playing Starcraft Remastered on the Xbox or Factorio on your steamdeck. I hear you can play DOOM on a pregnancy test too. Sure, you can, but its just going to be a tedious inferior experiance.
Really, this is just a fine example of how overhyped AI is right now.
Laptop manufacturers are too desperate to cash on the AI craze. There's nothing special about an 'AI PC'. It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.
>I don't want this garbage on my laptop, especially when its running of its battery!
The one bit of good news is it's not going to impact your battery life because it doesn't do any on-device processing. It's just calling an LLM in the cloud.
That's not quite correct. Snapdragon chips that are advertised as being good for "AI" also come with the Hexagon DSP, which is now used for (or targeted at) AI applications. It's essentially a separate vector processor with large vector sizes.
There's nothing special with what Intel has lowered the bar as an AI PC so vendors can market it. Ollama can run a 4b model plenty fine on Tiger Lake with 8gb classic RAM.
But unified memory IS truly what makes an AI ready PC. The Apple Silicon proves that. People are willing to pay the premium, and I suspect unified memory will still be around and bringing us benefits even if no one cares about LLMs in 5 years.
> It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.
"AI PC" branded devices get "Copilot+" and additional crap that comes with that due to the NPU. Despite desktops having GPUs with up to 50x more TOPs than the requirement, they don't get all that for some reason https://www.thurrott.com/mobile/copilot-pc/323616/microsoft-...
Is Microsoft trying to help NPU chip makers?
When is Wintel going to finally happen?
Microsoft has roughly $102 billion in cash (+ short-term investments). Intel’s market value is approximately $176 billion.
I've never really understood why Microsoft helped Intel's bottom line over decades.
With Azure, Microsoft has even more reason to buy Intel.
Doesn't this lead to a lot of tension between the hardware makers and Microsoft?
MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.
Laptop manufacturers are making laptops that can run an LLM locally, but there's no point in that unless there's a local LLM to run (and Windows won't have that because Copilot). Are they going to be pre-installing Llama on new laptops?
Are we going to see a new power user / normal user split? Where power users buy laptops with LLMs installed, that can run them, and normal folks buy something that can call Copilot?
Any ideas?
It isn't just copilot that these laptops come with; manufacturers are already putting their own AI chat apps as well.
For example, the LG gram I recently got came with just such an app named Chat, though the "ai button" on the keyboard (really just right alt or control, I forget which) defaults to copilot.
If there's any tension at all, it's just who gets to be the default app for the "ai button" on the keyboard that I assume almost nobody actually uses.
Interesting. Yeah, that'll be the argument
> MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.
MS doesn't care where your data is, they're happy to go digging through your C drive to collect/mine whatever they want, assuming you can avoid all the dark patterns they use to push you to save everything on OneDrive anyway and they'll record all your interactions with any other AI using Recall
I had assumed that they needed the usage to justify the investment in the data centre, but you could be right and they don't care.
Copilot is a local LLM (well SLM). https://learn.microsoft.com/en-us/windows/ai/apis/phi-silica
It's just marketing. The laptop makers will market it as if your laptop power makes a difference knowing full well that it's offloaded to the cloud.
For a slightly more charitable perspective, agentic AI means that there is still a bunch of stuff happening on the local machine, it's just not the inference itself.
Even collecting and sending all that data to the cloud is going to drain battery life. I'd really rather my devices only do what I ask them to than have AI running the background all the time trying to be helpful or just silently collecting data.
Copilot is just ChatGPT as an app.
If you don't use it, it will have no impact on your device. And it's not sending your data to the cloud except for anything you paste into it.
So, the new AI features like recall don’t exist?
Windows is going more and more into AI and embedding it into the core of the OS as much as it can. It’s not “an app”, even if that was true now it wouldn't be true for very long. The strategy is well communicated.
>> I'd really rather my devices only do what I ask them to
Linux hears your cry. You have a choice. Make it.
Unfortunately still loads of hurdles for most people.
AAA Games with anti-cheat that don't support Linux.
Video editing (DaVinci Resolve exists but is a pain to get up and running on many distros, KDenLive/OpenShot don't really cut it for most)
Adobe Suite (Photoshop/Lightroom specifically, and Premiere for Video Editing) - would like to see Affinity support Linux but hasn't happened so far. GIMP and DarkTable aren't really substitutions unless you pour a lot of time into them.
Tried moving to Linux on my laptop this past month, made it a month before a reinstall of Windows 11. Had issues with WiFi chip (managed to fix but had to edit config files deep in the system, not ideal), Fedora with LUKS encryption after a kernel update the keyboard wouldn't work to input the encryption key, no Windows Hello-like support (face ID). Had the most success with EndeavourOS but running Arch is a chore for most.
It's getting there, best it's ever been, but there's still hurdles.
According to my friends, Arc Raders works well on linux. So it's very much, just a small selection of AAA games, so they can run anti-cheat, that probably doesn't even work. Can you name a triple a you want to play, that proton says is incompatible?
Gimp isn't a solution, sure but it works for what I need. Darktable does way more than I've ever wanted, so I can forgive it for the one time it crashed. Inkscape and blender both exceed my needs as well.
And Adobe is so user hostile, that I feel I need to call you a mean name to prove how I feel.... dummy!
Yes, I already feel bad, and I'm sorry. But trolling aside, listing applications that treat users like shit, aren't reasons to stay on the platform that also treats you like shit.
I get it, sometimes, being treated like shit is worth it because it's easier now that you're used to being disrespected. But an aversion to the effort it'd take for you to climb the learning curve of something different, isn't valid reason to help the disrespectful trash companies making the world worse, recruit more people for them to treat like trash.
Just because you use it, doesn't make it worth recommending.
I don't really PC game anymore, use my Xbox or a few older games my laptop's iGPU can handle, not at the moment anyway. Battlefield 6 is a big one recently that if I had a gaming PC set-up I'd probably want to play.
I know Adobe are... c-words, but their software is industry standard for a reason.
Part of me is starting to think Valve is going to be the best thing to happen to Linux (in this regard) since Ubuntu.
AI PCs also have NPUs which I guess provide accelerated matmuls, albeit less accelerated than a good discrete GPU.
I have a Snapdragon laptop and it is the best I've ever had. But the NPU is really almost useless.
This is a nice companion to the article: https://www.pcworld.com/article/2965927/the-great-npu-failur...
Agreed, I have the ARM based T14s for work.
The thing is nowhere near the performance as a macbook, but its silent and the battery lasts ages, which is a far cry from the same laptop with an Intel CPU, which is what many are running.
Company removes a lot of the AI bloat though.
Factorio runs really well on the deck though...
But yeah, fresh install of OS is a must for any new computer.
> Running AI on your laptop is like playing Starcraft Remastered on the Xbox
A great analogy because there is Starcraft for a console - Nintendo 64 - and it is quite awkward. Split-screen multiplayer included.
This mostly just shows you how far behind the M1 (which came out 5 years ago) all the non Apple laptops are.
Was never really into Apple hardware (mainly the price), however I recently got an M1 Mac Mini and an iPhone for app development, and the inference speed for as you say, a 5 year old chip is actually crazy.
If they made the M series fully open for Linux (I know Asahi is working away) I probably would never buy another non-M series processor again.
I got an M1 Mac Mini somewhat recently as well, to replace my ~2012 Mac Mini that I use as a media center PC. And frankly, it's overkill. Used ones can be had for $200-$300 USD, lower side with cosmetic damage. An absolute steal, IMO.
You can still get an M1 Macbook Air at retail for $599 ($300 for refurbs), which is a Chromebook price for a laptop that is better in pretty much every respect than any Chromebook.
Outside of Apple laptops (and arguably the Ryzen AI MAX 390), an "AI ready" laptop is simply marketing speak for "is capable of making HTTP requests."
I think only a small percentage of users care that much about running LLMs locally to pay for extra hardware for it, put up with slower and lower-quality responses, etc. . It’ll never be as good as non-local offerings, and is more hassle.
With the wild ram prices, which btw are probably going to last out 2026, I expect 8 GB ram to be the new standard going on forward.
32 GB ram will be for enthusiasts with deep pockets, and professionals. Anything over that, exclusively professionals.
The conspiracy theorist inside me is telling me that big AI companies like OpenAI would rather see that people are using their puny laptops as terminals / shells only, to reach sky-based models, than to let them have beefy laptops and local models.
The price of RAM is going to throw a wrench at that
I feel like there's no point to get a graphics card nowadays. Clearly, graphics cards are optimized for graphics; they just happened to be good for AI but based on the increased significance of AI, I'd be surprised if we don't get more specialized chips and specialized machines just for LLMs. One for LLMs, a different one for stable diffusion.
With graphics processing, you need a lot of bandwidth to get stuff in and out of the graphics card for rendering on a high-resolution screen, lots of pixels, lots of refreshes, lots of bandwidth... With LLMs, a relatively small amount of text goes in and a relatively small amount of text comes out over a reasonably long amount of time. The amount of internal processing is huge relative to the size of input and output. I think NVIDIA and a few other companies already started going down that route.
But probably graphics cards will still be useful for stable diffusion; especially AI-generated videos as the inputs and output bandwidth is much higher.
Nah, that's just plain wrong.
First, GPGPU is powerful and flexible. You can make an "AI-specific accelerator", but it wouldn't be much simpler or much more power-efficient - while being a lot less flexible. And since you need to run traditional graphics and AI workloads both in consumer hardware? It makes sense to run both on the same hardware.
And bandwidth? GPUs are notorious for not being bandwidth starved. 4K@60FPS seems like a lot of data to push in or out, but it's nothing compared to how fast modern PCIe 5.0 x16 goes. AI accelerators are more of the same.
GPUs might not be bandwidth starved most of the time, but they absolutely are when generating text from an llm. It’s the whole reason why low precision floating point numbers are being pushed by nvidia.
> Clearly, graphics cards are optimized for graphics; they just happened to be good for AI
I feel like the reverse has been true since after the Pascal era.
LLMs are enormously bandwidth hungry. You have to shuffle your 800GB neural network in and out of memory for every token, which can take more time/energy than actually doing the matrix multiplies. GPUs are almost not high bandwidth enough.
But even so, for a single user, the output rate for a very fast LLM would be like 100 tokens per second. With graphics, we're talking like 2 million pixels, 60 times a second; 120 million pixels per second for a standard high res screen. Big difference between 100 tokens vs 120 million pixels.
24 bit pixels gives 16 million possible colors... For tokens, it's probably enough to represent every word of the entire vocabulary of every major national language on earth combined.
> You have to shuffle your 800GB neural network in and out of memory
Do you really though? That seems more like a constraint imposed by graphics cards. A specialized AI chip would just keep the weights and all parameters in memory/hardware right where they are and update them in-situ. It seems a lot more efficient.
I think that it's because graphics cards have such high bandwidth that people decided to use this approach but it seems suboptimal.
But if we want to be optimal; then ideally, only the inputs and outputs would need to move in and out of the chip. This shuffling should be seen as an inefficiency; a tradeoff to get a certain kind of flexibility in the software stack... But you waste a huge amount of CPU cycles moving data between RAM, CPU cache and Graphics card memory.
If we did that it would be much more expensive, keeping all weights in SRAM is done by Groq for example.
This doesn't seem right. Where is it shuffling to and from? My drives aren't fast enough to load the model every token that fast, and I don't have enough system memory to unload models to.
From VRAM to the tensor cores and back. On a modern GPU you can have 1-2tb moving around inside the GPU every second.
This is why they use high bandwidth memory for VRAM.
This makes sense now, thanks!
If you're using a MoE model like DeepSeek V3 the full model is 671 GB but only 37 GB are active per token, so it's more like running a 37 GB model from the memory bandwidth perspective. If you do a quant of that it could e.g. be more like 18 GB.
You're probably not using an 800GB model.
It is right. The shuffling is from CPU memory to GPU memory, and from GPU memory to GPU. If you don’t have enough memory you can’t run the model.
How can I observe it being loaded into CPU memory? When I run a 20gb model with ollama, htop reports 3gb of total RAM usage.
Think of it like loading a moving truck where:
- The house is the disk
- You are the RAM
- The truck is the VRAM
There won't be a single time you can observe yourself carrying the weight of everything being moved out of the house because that's not what's happening. Instead you can observe yourself taking many tiny loads until everything is finally moved, at which point you yourself should not be loaded as a result of carrying things from the house anymore (but you may be loaded for whatever else you're doing).
Viewing active memory bandwidth can be more complicated than it'd seem to set up, so the easier way is to just view your VRAM usage as you load in the model freshly into the card. The "nvtop" utility can do this for most any GPU on Linux, as well as other stats you might care about as you watch LLMs run.
Depends on map_location arg in torch.load: might be loaded straight to GPU memory
I don't doubt that there will be specialized chips that make AI easier, but they'll be more expensive than the graphics cards sold to consumers which means that a lot of companies will just go with graphics cards, either because the extra speed of specialized chips won't be worth the cost, or will they'll be flat out too expensive and priced for the small number of massive spenders who'll shell out insane amounts of money for any/every advantage (whatever they think that means) they can get over everyone else.
I predict we will see compute-in-flash before we see cheap laptops with 128+ gigs of ram.
There was a company that did compute-in-dram, which was recently acquired by Qualcomm: https://www.emergentmind.com/topics/upmem-pim-system
I can't tell if this is optimism for compute-in-flash or pessimism with how RAM has been going lately!
We’ve had “compute in flash” for a few years now: https://mythic.ai/product/
Memristors are (IME) missing from the news. They promised to act as both persistent storage and fast RAM.
If only memristors weren't vaporware that has "shown promise" for 3 decades now and went nowhere.
Yeah especially since what is happening in the memory market
Feast and famine.
In three years we will be swimming in more ram than we know what to do with.
Kind of feel that's already the case today... 4GB I find is still plenty for even business workloads.
Video games have driven the need for hardware more than office work. Sadly games are already being scaled back and more time is being spent on optimization instead of content since consumers can't be expected to have the kind of RAM available they normally would and everyone will be forced to make do with whatever RAM they have for a long time.
That might not be the case. The kind of memory that will flood the second-hand market could not be the kind of memory we can stuff in laptops or even desktop systems.
You could get 128gb ram laptops from the time ddr4 came around: workstation class laptops with 4 ram slots would happily take 128gb of memory.
The fact that nowadays there are little to no laptops with 4 ran slots is entirely artificial.
I was mussing this summer if I should get a refurbed Thinkpad P16 with 96GB of RAM to run VMs purely in memory. Now that 96GB of ram cost as much as a second P16.
I feel you, so much. I was thinking of getting a second 64gb node for my homelab and i thought i’d save those money… now the ram alone cost as much as the node, and I’m crying.
Lesson learned: you should always listen to that voice inside your head that say: “but i need it…” lol
I rebuilt a workstation after a failed motherboard a year ago. I was not very excited about being forced to replace it on a days notice and cheaped out on the RAM (only got 32GB). This is like the third or fourth time I've taught myself the lesson to not pinch pennies when buying equipment/infrastructure assets. It's the second time the lesson was about RAM, so clearly I'm a slow learner.
By "we" do you mean consumers? No, "we" will get neither. This is unexpected, irresistable opportunity to create a new class, by controlling the technology that people are required and are desiring to use (large genAI) with a comprehensive moat — financial, legislative and technological. Why make affordable devices that enable at least partial autonomy? Of course the focus will be on better remote operation (networking, on-device secure computation, advancing narrative that equates local computation with extremism and sociopathy).
I'm running GPT-OSS 120B on a MacBook Pro M3 Max w/128 GB. It is pretty good, not great, but better than nothing when the wifi on the plane basically doesn't work.
This article is so dumb. It totally ignores the memory price explosion that will make large fast memory laptops unfeasible for years and states stuff like this:
> How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly. It’s not possible to run these models on today’s consumer hardware, so real-world tests just can’t be done.
We know exactly the performance needed for a given responsiveness. TOPS is just a measurement independent from the type of hardware it runs on..
The less TOPS the slower the model runs so the user experience suffers. Memory bandwidth and latency plays a huge role too. And context, increase context and the LLM becomes much slower.
We don't need to wait for consumer hardware until we know much much is needed. We can calculate that for given situations.
It also pretends small models are not useful at all.
I think the massive cloud investments will put pressure away from local AI unfortunately. That trend makes local memory expensive and all those cloud billions have to be made back so all the vendors are pushing for their cloud subscriptions. I'm sure some functions will be local but the brunt of it will be cloud, sadly.
also, state of the art models have hundreds of _billions_ of parameters.
It tells you about their ambitions..
I suppose it depends on the model, code was useless. As a lossy copy of an interactive Wikipedia it could be ok not good or great just ok.
Maybe for creative suggestions and editing it’d be ok.
You don't understand the needs of a common laptop user. Define the usecases that require reaching out to laptop instead of using the phone that is nearby. Those usecases don't need LLM for a common laptop user.
I’ve been running LLMs on my laptop (M3 Max 64GB) for a year now and I think they are ready, especially with how good mid sized models are getting. I’m pretty sure unified memory and energy efficient GPUs will be more than just a thing on Apple laptops in the next few years.
Only because of Apples unified memory architecture. The groundwork is there, we just need memory to be cheaper so we can fit 512+GB now ;)
Memory prices will rise short term and generally fall long term, even with the current supply hiccup the answer is to just build out more capacity (which will happen if there is healthy competition). I meant, I expect the other mobile chip providers to adopt unified architecture and beefy GPU cores on chip and lots of bandwidth to connect it to memory (at the max or ultra level, at least), I think AMD is already doing UM at least?
> Memory prices will rise short term and generally fall long term, even with the current supply hiccup the answer is to just build out more capacity (which will happen if there is healthy competition)
Don't worry! Sam Altman is on it. Making sure there never is healthy competition that is.
https://www.mooreslawisdead.com/post/sam-altman-s-dirty-dram...
We’ve been through multiple cycles of scarcity/surplus DRAM cycles in the last couple of decades. Why do we think it will be different now?
The problem with this is that NPU have terrible, terrible support in the various software ecosystems because they are unique to their particular soc or whatever. No consistency even within particular companies.
Seems like wishful thinking.
> How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly.
Why not extrapolate from open-source AIs which are available? The most powerful open-source AI (which I know of) is Kimi K2 and >600gb. Running this at acceptable speed requires 600+gb GPU/NPU memory. Even $2000-3000 AI-focused PCs like the DGX spark or Strix Halo typically top out at 128gb. Frontier models will only run on something that costs many times a typical consumer PC, and only going to get worse with RAM pricing.
In 2010 the typical consumer PC had 2-4gb of RAM. Now the typical PC has 12-16gb. This suggests RAM size doubling perhaps every 5 years at best. If that's the case, we're 25-30 years away from the typical PC having enough RAM to run Kimi K2.
But the typical user will never need that much RAM for basic web browsing, etc. The typical computer RAM size is not going to keep growing indefinitely.
What about cheaper models? It may be possible to run a "good enough" model on consumer hardware eventually. But I suspect that for at least 10-15 years, typical consumers (HN readers may not be typical!) will prefer capability, cheapness, and especially reliability (not making mistakes) over being able to run the model locally. (Yes AI datacenters are being subsidized by investors; but they will remain cheaper, even if that ends, due to economies of scale.)
The economics dictate that AI PCs are going to remain a niche product, similar to gaming PCs. Useful AI capability is just too expensive to add to every PC by default. It's like saying flying is so important, everyone should own an airplane. For at least a decade, likely two, it's just not cost-effective.
> It may be possible to run a "good enough" model on consumer hardware eventually
10-15 years?!!!! What is the definition of good enough? Qwen3 8B or A30B are quite capable models which run on a lot of hardware even today. SOTA is not just getting bigger, it's also getting more intelligence and running it more efficiently. There have been massive gains in intelligence at the smaller model sizes. It is just highly task dependent. Arguably some of these models are "good enough" already, and the level of intelligence and instruction following is much better from even 1 year ago. Sure not Opus 4.5 level, but still much could be done without that level of intelligence.
"Good enough" has to mean users won't be frequently frustrated if they transition to it from a frontier model.
> it is highly task dependent... much could be done without that level of intelligence
This is an enthusiast's glass-half-full perspective, but casual end users are gonna have a glass-half-empty perspective. Quen3-8B is impressive, but how many people use it as a daily driver? Most casual users will toss it as soon as it screws up once or twice.
The phrase you quoted in particular was imprecise (sorry) but my argument as a whole still stands. Replace "consumer hardware" with "typical PCs" - think $500 bestseller laptops from Walmart. AI PCs will remain niche luxury products, like gaming PCs. But gaming PCs benefit from being part of gaming culture and because cloud gaming adds input latency. Neither of these affects AI much.
You may be correct, but I wonder if we'll see Mac Mini sized external AI boxes that do have the 1TB of RAM and other hardware for running local models.
Maybe 100% of computer users wouldn't have one, but maybe 10-20% of power users would, including programmers who want to keep their personal code out of the training set, and so on.
I would not be surprised though if some consumer application made it desirable for each individual, or each family, to have local AI compute.
It's interesting to note that everyone owns their own computer, even though a personal computer sits idle half the day, and many personal computers hardly ever run at 80% of their CPU capacity. So the inefficiency of owning a personal AI server may not be as much of a barrier as it would seem.
But will it ever lead to a Mac Mini-priced external AI box? Or will this always be a premium "pro" tier that seems to rival used car prices?
> but I wonder if we'll see Mac Mini sized external AI boxes that do have the 1TB of RAM
Isn't that the Mac Studio already? Ok, it seems to max at 512 GB.
> In 2010 the typical consumer PC had 2-4gb of RAM. Now the typical PC has 12-16gb. This suggests RAM size doubling perhaps every 5 years at best. If that's the case, we're 25-30 years away from the typical PC having enough RAM to run Kimi K2.
Part of the reason that RAM isn't growing faster is that there's no need for that much RAM at the moment. Technically you can put multiple TB of RAM in your machine, but no-one does that because it's a complete waste of money [0]. Unless you're working in a specialist field 16Gb of RAM is enough, and adding more doesn't make anything noticeably faster.
But given a decent use-case, like running an LLM locally, and you'd find demand for lots more RAM, and that would drive supply, and new technology developments, and in ten years it'll be normal to have 128TB of RAM in a baseline laptop.
Of course, that does require that there is a decent use-case for running an LLM locally, and your point that that is not necessarily true is well-made. I guess we'll find out.
[0] apart from a friend of mine working on crypto who had a desktop Linux box with 4TB of RAM in it.
I spent a good 30 seconds trying to figure out what DDS was an acronym for in this context.
This must be referring mostly to windows, or non-Apple laptops
I mean, having a more powerful laptop is great, but at the same time, these guys are calling for a >10x increase in RAM and a far more powerful NPU. How will this affect pricing? How will it affect power management? It made it seem like most of the laptop will be dedicated to gen AI services, which I'm still not entirely convinced are quite THAT useful. I still want a cheap laptop that lasts all day and I also want to be able to tap that device's full power for heavy compute jobs!
I have no desire to run an LLM on my laptop when I can run one on a computer the size of six football fields.
The point is that when you run it on your own hardware you can feed the model your health data, bank statements and private journals and can be 5000% sure they’re not going anywhere
Regular people don't understand nor care about any of that. They'll happily take the Faustian bargain.
I've been playing around with my own home-built AI server for a couple months now. It is so much better than using a cloud provider. It is the difference between drag racing in your own car, and renting one from a dealership. You are going to learn far more doing things yourself. Your tools will be much more consistent and you will walk away with a far greater understanding of every process.
A basic last-generation PC with something like a 3060ti (12GB) is more than enough to get started. My current rig pulls less than 500w with two cards (3060+5060). And, given the current temperature outside, the rig helps heat my home. So I am not contributing to global warming, water consumption, or any other datacenter-related environmental evil.