"I am not sure how many people will run AI models locally. It still seems like a niche application to me. However, it will make decent machines to play video games."
I don't know who will be the winner but with some of the recent releases from gemma it seems more probable that you may run some models locally if only from a cost perspective, not even considering business security. Not sure how this type of architecture would make for good gaming though, puts into question the whole statement.
"Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers" - side note but this guy puts this everywhere, gives me probably the inverse of what he is marketing for.
"I am not sure how many people will run AI models locally. It still seems like a niche application to me. However, it will make decent machines to play video games..."
This is the 2026 edition of Ken Olsen:
"There is no reason anyone would want a computer in their home"
> This is the 2026 edition of Ken Olsen: "There is no reason anyone would want a computer in their home"
Digging into this:
> In conclusion, there is evidence that Ken Olsen did doubt the need for computers in the home, but the evidence is based primarily on the testimony of David Ahl who was perturbed when the personal computer project he championed at DEC was not supported by Olsen in 1974.
> Olsen’s resistance may have been similar to that expressed by another DEC executive, Gordon Bell. In 1980 Bell thought home terminals would act as gateways to remote computers which would provide appropriate services.
It was supposedly said in 1977: most computers at that time were not small, and so it would not be surprising that people would not expect the general public to desire a large, power-hungry, noise-y apparatus in their house.
Local models aren’t deterministically equivalent in capabilities to foundation models. Home computers are turing complete; just like a mainframe. They are just slower. Often not slower enough to matter.
Most people are ok with slower. An AI that lets you edit a family picture, in say 30 seconds, locally is preferable to one that is instantaneous but requires you to submit that picture to examination/storage/training/sale in someone else's AI ecosystem. If i want to crop my ex out of family photos, i should not have to first give that photo to Microsoft. If want an LLM to write a book report for me, i dont want it also alerting my school.
> you may run some models locally if only from a cost perspective
I have a hard time believing running a model on a laptop will be cheaper than running it in a datacenter. Why wouldn't economies of scale apply here as with every other computation?
This is assuming that you'll be priced the fraction of computing that you consumed. But you are actually paying for their infrastructure, for the R&D (and also the computation that went into training the model) etc.
It is not clear that, for your own small computations, this kind of costs are needed, but you will still pay your share in the investment the provider made so that they could serve everyone's computation needs.
In that analogy bigtech AI is currently investing in cleaner air for all of us? We _could_ breath it through their hose, but might as well breath it outside.
A laptop is really a pretty bad form factor to run LLMs. Worst cooling, more expensive memory that you cannot replace, resell value depreciating fast. It’s fine for tinkering, small scale research, and demos but it’s definitely niche.
The vision NVIDIA is selling is pure marketing IMHO
Lots of people are already running AI locally. They are the people buying up all the consumer-grade nvidea gpus. What are they doing with them? Well, the same things people with home media or email servers are doing: stuff they dont want to share with the general public.
I want to reduce my dependency on companies like Google, OpenAI, and Anthropic. Aside from the concerns of data sharing I'm also not a fan of how they run their operations, for example Anthropic now using xAI's Colossus data center which is poisoning a marginalized community, or OpenAI getting in bed with the military.
Not everything I want to use an LLM for requires "PhD level intelligence", and increasingly I'm finding more uses that involve sharing my personal data.
Yesterday my local model helped me when looking for a doctor who is in-network for my insurance. I threw it a screenshot from the providers search results and it looked up reviews for all of them.
> "Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers" - side note but this guy puts this everywhere, gives me probably the inverse of what he is marketing for.
Lol yeah seriously, that stinks "I ask AI to generate a huge amount of bullshit and upload it to pad irrelevant stats".
I agree that it sends the wrong symbol, but actually Daniel is great. He cares tremendously about doing work that is actually real-world useful. I've co-written a few papers with him, and he's really hard working and open to outside suggestions. The danger is that if you send him comments, he'll eventually manage to rope you into writing a new and improved version. Seriously, if you are a non-academic computer scientist with a good idea that you want to publish, he'd be incredibly open to working with you.
As to why he now has this on his blog? I also cringe when I read it. I presume someone told him he should self-promote more, and this is his lame attempt to do so. He's almost certainly the most cited person in his department, but it's entirely possible that none of his colleagues actually know this. Cut him some slack. Self-promotion is not his strength. He's a nerd's nerd, and not a marketer. I'll mention to him that his attempt here might be backfiring when I'm next in contact with him.
I think the local-model use case is going to become less niche pretty quickly if the models keep getting smaller and more capable. Even if most people do not care about privacy or offline use, the cost argument is pretty strong
This feels fluff to me on the part of the author (whose work I don’t want to trivialize) but I don’t think they’ve actually looked deeper than a paper spec sheet on this.
1. Yes it has the same number of cores as a 5070 mobile. It’s also running at a shared peak of 2/3 the bandwidth and a shared peak of 2/3 the TDP. The GPU by itself will likely perform at half the dedicated units performance
2. Apple may not have SVE2 but they do have the AMX (private) and SME. I don’t see why he thinks the SVE2 will give him more performance than the SME.
3. He mentions a single core type but doesn’t mention the total makeup. We already have known for a year how the DGX Spark compares to Apple chips. For CPU it’s roughly equivalent to an M3 Pro and for GPU compute (not rasterization) it’s between an M4 Pro and M4 Max without considering bandwidth.
The real advantage to these is that they run CUDA. That’s it. Otherwise when they launch they’ll be 2-3 generations behind where Apple is and 1 gen behind AMD.
The other super power of the DGX Spark was the NIC for pairing them together. But that’s been removed here too.
The interesting part to me isn't really the Cortex-X925 vs AVX-512 comparison, but Nvidia trying to make the GPU the center of a Windows PC rather than an add-in card
A large part of Intel's success over decades was to capture as much of the value from the PC for themselves. This previously caused a confrontation between the two in 2009 when Intel integrated the memory controller into the CPU and argued that Nvidia's licensing agreement did not allow them to produce chipsets for such CPUs. Nvidia was developing an x86 CPU based on licensed technology from Transmeta, but after the legal battle with Intel they pivoted to producing an ARM CPU (released as Denver) based on this technology instead.
Now that Intel is historically weak, Nvidia is attempting to reverse the situation.
Still, Microslop has repeatedly proven their ability to slow everything down to a crawl no matter how powerful the hardware. If you want it to be fast, don’t use Windows.
As he likes to share often, "He ranks among the top 2% of scientists globally (Stanford/Elsevier 2025) and is one of GitHub's top 1000 most followed developers. "
Is it really unified memory? AMD Strix Halo is "unified" but you still have to allocate memory separately for cpu vs gpu. Apple Silicon is true unified memory.
My understanding is that this is the limitation from Windows not from AMD SoC. There are several internet resources to "enable unified memory support" on linux eg [1].
As a side note, qualcomm chip set on Android has been doing this for years (like Apple) so it's not super unique thing. It's more like there was no need before.
Even then the "reserved" section is a carve out guaranteed chunk to allow stuff that might need contiguous physical memory (display scan out buffers and page tables, for example) and similar.
The GPU can still happily use all the rest of the memory for other use cases - which tend to be the bulk of allocations anyway. Though there might be performance implications - for example "moving" buffer ownership to the GPU would need to evict CPU caches, and often 4k pages and tlb lookups can be a pretty inefficient situation for GPU-style accesses.
That's been pretty standard for any SoC for decades. And "differences" to apple's SoC are more implementation details.
yes, but more due to OS limitations than hardware. You can use their GTT which is then _true_ UMA where GPU can grab whatever it wants from the memory pool.
This isn't the first time we have UMA on the PC, btw. When SGI did their PC workstations, their 320 and 540 PC workstations had what they called Cobalt graphics chipset and crossbar with their IVC architecture. They bypassed AGP at the time completely. It was quite unique to see strict UMA on a PC. Haven't seen it since until these new systems we're seeing now on PCs and Mac.
For local models, the useful part is not just having 128GB attached to the package. It is whether the GPU can practically use that memory without the usual VRAM-style constraints
> you still have to allocate memory separately for cpu vs gpu
That's an API issue not a hardware issue. Regardless, I believe the major APIs permit seamlessly sharing pointers at this point? (I have no experience doing that though.)
It is unified in the sense that the OS can dynamically assign memory to CPU and GPU. Apple silicon is not a alien tech that other silicon vendors cannot implement.
I have been somewhat surprised at the lack of commentators observing that this is Microsoft and above all NVIDIA launching a device that is fundamentally at odds with the metered cloud model of AI.
When you look at the other announcements and murmurings (better offline BYOK for Copilot, talk of an unmetered AI future) I think it’s clear that these two firms understand that cloud-only AI is not sustainable or inherently in their interests. But their willingness to undermine OpenAI with a product like this is notable.
while unified memory may offer better performance than unsoldered DDR system memory, it still won't be as great as 1.8TB/s bandwidth on high end consumer GPUs right now.
nvidias master plan may be making it the new normal to have "only" 400GB/s bandwidth, thus gatekeeping local model usage further behind "more memory but not as fast as the cloud can do it"
I think it’s an interesting theory but a bit too conspiracy theory-ish.
Nvidia just wants to sell stuff to everyone.
And I think for professionals doing local AI work, products like Strix Halo and Apple Silicon are a competitive threat.
A big part of maintaining the leading software ecosystem is ensuring you have competitive hardware for all your users.
I also think the RTX Spark product is relatively low effort for Nvidia. Grab a Mediatek CPU and slap an Nvidia GPU on the die. Sure, that’s oversimplifying it, but still.
Don't want to be too harsh, maybe I'm missing something, but the CPU is at least 2 years old, internally it has been a complete shitshow and that's a minor hiccup when compared to the firmware and software situation.
It's an interesting "newcomer" and the more the better but calling this a "beast" and a "game changer" is ridiculous to say the least.
I have a 128 GB LPDDR5X machine. It's a great workstation laptop (which is why I got it) but the memory bandwidth is just awful if you're wanting to use it for AI. An old Epyc COU will fair better both in terms of being able to run full sized larger models as well as having higher memory bandwidth, and that's not a recommendation to go that route either as it's still not worth it.
The competitor for this NVIDIA CPU will not be the now old AMD Strix Halo, but its successor (launched recently), which supports up to 192 GB of unified memory. Thus 128 GB is no longer SOTA.
While this NVIDIA system is inferior from the point of view of the memory capacity, its main advantage is that the top models will have a bigger GPU, i.e. with 6144 or 5120 FP32 execution units, compared to 2560 for the AMD GPU (compared to the NVIDIA CPU, the AMD CPU has a better multi-threaded performance for legacy programs, and a much better multi-threaded performance for the applications that use AVX-512).
However, these top models with big GPUs will also be much more expensive than the competing AMD system, while also being much more expensive than a laptop or mini-PC with an equivalent discrete NVIDIA GPU (which has the disadvantage of having direct access only to a much smaller, even if faster, memory).
I don’t think there is much improvement in compute for the new strix halo revision. The next one supposedly adds rdna4 cores or similar and more memory channels
Does this person know that this is the same GB chip in the DGX Spark? It isn't some proposed thing, it's a chip loads of people have on their desk right now, and there are endless benchmarks of it.
Decent single core (a long ways from Apple level, but decent), but it makes up for it in cores to provide M5 level performance, CPU wise. Memory bandwidth it is kind of starved, at 1/6th many GPUs.
They got Microsoft to customize Windows for the RTX Spark, and will likely have to brutally throttle it when running as a laptop (it's literally a 140W TDP chip), and that's neat. It's going to be a very expensive laptop.
This is probably the better way to frame it: not "Nvidia is proposing a new CPU system" but "Nvidia is trying to move an existing GB/Spark-class platform into a Windows PC form factor"
I’m not sure if you’re aware but there is a supply chain shortage for pretty much everything needed for a PC that isn’t expected to be solved this year or next year. There is no way that can be affordable
It uses LPDDR5X instead of VRAM and will still sell for a premium while pushing their presence even further in every side of the AI market. This was one area AMD was ahead in and now Nvidia is probably better off making this to compete on that front while still being better off than making a 5090.
That doesn't answer the question. If the high margin enterprise GPUs are saturating the fab capacity you wouldn't expect them to be pushing this. But IIRC those all have oodles of integrated HBM at this point so I wonder if fab capacity for that has become a bottleneck.
I think it's niche now because getting the hardware to run it is expensive and the quantized models don't work as well. If those improve then it would be a no brainer to pay one off for the hardware instead of a fortune for API calls.
I am not really convinced that four bit quantisation is that bad; almost certainly six will be enough. But Google are making claims for their QAT tech in Gemma that they are surely using or testing in Gemini that it preserves nearly source model quality while reducing footprint.
The hardware for 50 tokens per second with a four bit quantisation of Gemma 4 26B or the sparse Qwen 3.6 is not really that expensive: it’s a secondhand M1 Max.
Beyond that, I agree. I think moving planning tasks to local is a now thing, not that it really has much impact on token spend. I also think many small coding tasks are fully within the grasp of the above two models.
The main issue right now is that the software landscape is rather confusing, but I reckon uncomplicated Gemma 4 26B QAT support with MTP is a few weeks away.
I think it is likely to appeal to video and photo editors who want to use AI tools (the press release has a quote from Blackmagic Design, as well as from Adobe, who I think have no stomach for their own cloud AI).
But I don’t know about specialised: this could run quite large models with MoE.
Performances of local models are pretty bad compared to what AI vendors offer, token generation is just too slow to be that useful. And you need to allocate GBs of memories, something that will stay very expensive to buy for a long time.
Running local models will stay niche for a while, unless we see breakthroughs
All depends. The current technology will be cheaper in a year or two. The best cutting edge stuff will properly be even more expensive. But in 10 years time... we can run current SOTA models (or models that are equally good ) on our local hardware
We had a thing called globalism that drastically reduced costs. Globalism right now is on life support. Given geopolitics, I don’t see how it’s going to survive.
"I am not sure how many people will run AI models locally. It still seems like a niche application to me. However, it will make decent machines to play video games."
I don't know who will be the winner but with some of the recent releases from gemma it seems more probable that you may run some models locally if only from a cost perspective, not even considering business security. Not sure how this type of architecture would make for good gaming though, puts into question the whole statement.
"Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers" - side note but this guy puts this everywhere, gives me probably the inverse of what he is marketing for.
"I am not sure how many people will run AI models locally. It still seems like a niche application to me. However, it will make decent machines to play video games..."
This is the 2026 edition of Ken Olsen: "There is no reason anyone would want a computer in their home"
> This is the 2026 edition of Ken Olsen: "There is no reason anyone would want a computer in their home"
Digging into this:
> In conclusion, there is evidence that Ken Olsen did doubt the need for computers in the home, but the evidence is based primarily on the testimony of David Ahl who was perturbed when the personal computer project he championed at DEC was not supported by Olsen in 1974.
> Olsen’s resistance may have been similar to that expressed by another DEC executive, Gordon Bell. In 1980 Bell thought home terminals would act as gateways to remote computers which would provide appropriate services.
* https://quoteinvestigator.com/2017/09/14/home-computer/
It was supposedly said in 1977: most computers at that time were not small, and so it would not be surprising that people would not expect the general public to desire a large, power-hungry, noise-y apparatus in their house.
That’s too strong of an assertion.
Local models aren’t deterministically equivalent in capabilities to foundation models. Home computers are turing complete; just like a mainframe. They are just slower. Often not slower enough to matter.
Most people are ok with slower. An AI that lets you edit a family picture, in say 30 seconds, locally is preferable to one that is instantaneous but requires you to submit that picture to examination/storage/training/sale in someone else's AI ecosystem. If i want to crop my ex out of family photos, i should not have to first give that photo to Microsoft. If want an LLM to write a book report for me, i dont want it also alerting my school.
> you may run some models locally if only from a cost perspective
I have a hard time believing running a model on a laptop will be cheaper than running it in a datacenter. Why wouldn't economies of scale apply here as with every other computation?
This is assuming that you'll be priced the fraction of computing that you consumed. But you are actually paying for their infrastructure, for the R&D (and also the computation that went into training the model) etc. It is not clear that, for your own small computations, this kind of costs are needed, but you will still pay your share in the investment the provider made so that they could serve everyone's computation needs.
In that analogy bigtech AI is currently investing in cleaner air for all of us? We _could_ breath it through their hose, but might as well breath it outside.
A laptop is really a pretty bad form factor to run LLMs. Worst cooling, more expensive memory that you cannot replace, resell value depreciating fast. It’s fine for tinkering, small scale research, and demos but it’s definitely niche.
The vision NVIDIA is selling is pure marketing IMHO
Lots of people are already running AI locally. They are the people buying up all the consumer-grade nvidea gpus. What are they doing with them? Well, the same things people with home media or email servers are doing: stuff they dont want to share with the general public.
I want to reduce my dependency on companies like Google, OpenAI, and Anthropic. Aside from the concerns of data sharing I'm also not a fan of how they run their operations, for example Anthropic now using xAI's Colossus data center which is poisoning a marginalized community, or OpenAI getting in bed with the military.
Not everything I want to use an LLM for requires "PhD level intelligence", and increasingly I'm finding more uses that involve sharing my personal data.
Yesterday my local model helped me when looking for a doctor who is in-network for my insurance. I threw it a screenshot from the providers search results and it looked up reviews for all of them.
> "Ranked in the top 2% of scientists globally (Stanford/Elsevier 2025) and among GitHub's top 1000 developers" - side note but this guy puts this everywhere, gives me probably the inverse of what he is marketing for.
Lol yeah seriously, that stinks "I ask AI to generate a huge amount of bullshit and upload it to pad irrelevant stats".
Absolute loser.
I found his website, https://www.lemire.me/en/ , and the "2%" brag is the very first sentence, geez.
Being the top x% is what OnlyFans girls brag about, professor...
And it's not exactly brain surgery, is it? https://www.youtube.com/watch?v=THNPmhBl-8I
> Daniel Lemire’s blog is one of the top 50 most popular blogs on Hacker News, the standard tech news aggregation site.
Citation needed
https://refactoringenglish.com/tools/hn-popularity/
I agree that it sends the wrong symbol, but actually Daniel is great. He cares tremendously about doing work that is actually real-world useful. I've co-written a few papers with him, and he's really hard working and open to outside suggestions. The danger is that if you send him comments, he'll eventually manage to rope you into writing a new and improved version. Seriously, if you are a non-academic computer scientist with a good idea that you want to publish, he'd be incredibly open to working with you.
As to why he now has this on his blog? I also cringe when I read it. I presume someone told him he should self-promote more, and this is his lame attempt to do so. He's almost certainly the most cited person in his department, but it's entirely possible that none of his colleagues actually know this. Cut him some slack. Self-promotion is not his strength. He's a nerd's nerd, and not a marketer. I'll mention to him that his attempt here might be backfiring when I'm next in contact with him.
I think the local-model use case is going to become less niche pretty quickly if the models keep getting smaller and more capable. Even if most people do not care about privacy or offline use, the cost argument is pretty strong
This feels fluff to me on the part of the author (whose work I don’t want to trivialize) but I don’t think they’ve actually looked deeper than a paper spec sheet on this.
1. Yes it has the same number of cores as a 5070 mobile. It’s also running at a shared peak of 2/3 the bandwidth and a shared peak of 2/3 the TDP. The GPU by itself will likely perform at half the dedicated units performance
2. Apple may not have SVE2 but they do have the AMX (private) and SME. I don’t see why he thinks the SVE2 will give him more performance than the SME.
3. He mentions a single core type but doesn’t mention the total makeup. We already have known for a year how the DGX Spark compares to Apple chips. For CPU it’s roughly equivalent to an M3 Pro and for GPU compute (not rasterization) it’s between an M4 Pro and M4 Max without considering bandwidth.
The real advantage to these is that they run CUDA. That’s it. Otherwise when they launch they’ll be 2-3 generations behind where Apple is and 1 gen behind AMD.
The other super power of the DGX Spark was the NIC for pairing them together. But that’s been removed here too.
The interesting part to me isn't really the Cortex-X925 vs AVX-512 comparison, but Nvidia trying to make the GPU the center of a Windows PC rather than an add-in card
A large part of Intel's success over decades was to capture as much of the value from the PC for themselves. This previously caused a confrontation between the two in 2009 when Intel integrated the memory controller into the CPU and argued that Nvidia's licensing agreement did not allow them to produce chipsets for such CPUs. Nvidia was developing an x86 CPU based on licensed technology from Transmeta, but after the legal battle with Intel they pivoted to producing an ARM CPU (released as Denver) based on this technology instead.
Now that Intel is historically weak, Nvidia is attempting to reverse the situation.
nb: poster is Daniel Lemire (https://lemire.me), who is very skilled in getting performance out of compute hardware (e.g. via simd, cache usage etc)
Still, Microslop has repeatedly proven their ability to slow everything down to a crawl no matter how powerful the hardware. If you want it to be fast, don’t use Windows.
As he likes to share often, "He ranks among the top 2% of scientists globally (Stanford/Elsevier 2025) and is one of GitHub's top 1000 most followed developers. "
based on citations and github stars? or what's the context there?
I was adding further citation based on his own claims. Not sure what context is missing.
It’s an opportunity for them to start doing away with the whole ATX thing where owners had freedom to mix and match at their own pleasure.
Is it really unified memory? AMD Strix Halo is "unified" but you still have to allocate memory separately for cpu vs gpu. Apple Silicon is true unified memory.
My understanding is that this is the limitation from Windows not from AMD SoC. There are several internet resources to "enable unified memory support" on linux eg [1].
As a side note, qualcomm chip set on Android has been doing this for years (like Apple) so it's not super unique thing. It's more like there was no need before.
[1] https://www.jeffgeerling.com/blog/2025/increasing-vram-alloc...
Even then the "reserved" section is a carve out guaranteed chunk to allow stuff that might need contiguous physical memory (display scan out buffers and page tables, for example) and similar.
The GPU can still happily use all the rest of the memory for other use cases - which tend to be the bulk of allocations anyway. Though there might be performance implications - for example "moving" buffer ownership to the GPU would need to evict CPU caches, and often 4k pages and tlb lookups can be a pretty inefficient situation for GPU-style accesses.
That's been pretty standard for any SoC for decades. And "differences" to apple's SoC are more implementation details.
That's a software question, not a hardware question.
Some software assumes pre-defined set-aside pools of memory reserved for video purposes, but the chip does actually have access to the whole pool.
yes, but more due to OS limitations than hardware. You can use their GTT which is then _true_ UMA where GPU can grab whatever it wants from the memory pool.
This isn't the first time we have UMA on the PC, btw. When SGI did their PC workstations, their 320 and 540 PC workstations had what they called Cobalt graphics chipset and crossbar with their IVC architecture. They bypassed AGP at the time completely. It was quite unique to see strict UMA on a PC. Haven't seen it since until these new systems we're seeing now on PCs and Mac.
For local models, the useful part is not just having 128GB attached to the package. It is whether the GPU can practically use that memory without the usual VRAM-style constraints
Memory bandwidth is what matters, unified or otherwise. Discrete GPUs don't have unified memory either.
Strix halo is unified memory. The memory allocation set in BIOS is overridden by the operating system if it has the capability.
> you still have to allocate memory separately for cpu vs gpu
That's an API issue not a hardware issue. Regardless, I believe the major APIs permit seamlessly sharing pointers at this point? (I have no experience doing that though.)
>AMD Strix Halo is "unified" but you still have to allocate memory separately for cpu vs gpu.
IIRC that's due to maintain BIOS and Windows (+games & apps) backwards compatibility, but memory access speeds are the same.
It is unified in the sense that the OS can dynamically assign memory to CPU and GPU. Apple silicon is not a alien tech that other silicon vendors cannot implement.
Here is the press release for the actual machine:
https://nvidianews.nvidia.com/news/nvidia-microsoft-windows-...
I have been somewhat surprised at the lack of commentators observing that this is Microsoft and above all NVIDIA launching a device that is fundamentally at odds with the metered cloud model of AI.
When you look at the other announcements and murmurings (better offline BYOK for Copilot, talk of an unmetered AI future) I think it’s clear that these two firms understand that cloud-only AI is not sustainable or inherently in their interests. But their willingness to undermine OpenAI with a product like this is notable.
Maybe. Or they are simply hedging their bets.
while unified memory may offer better performance than unsoldered DDR system memory, it still won't be as great as 1.8TB/s bandwidth on high end consumer GPUs right now.
nvidias master plan may be making it the new normal to have "only" 400GB/s bandwidth, thus gatekeeping local model usage further behind "more memory but not as fast as the cloud can do it"
I think it’s an interesting theory but a bit too conspiracy theory-ish.
Nvidia just wants to sell stuff to everyone.
And I think for professionals doing local AI work, products like Strix Halo and Apple Silicon are a competitive threat.
A big part of maintaining the leading software ecosystem is ensuring you have competitive hardware for all your users.
I also think the RTX Spark product is relatively low effort for Nvidia. Grab a Mediatek CPU and slap an Nvidia GPU on the die. Sure, that’s oversimplifying it, but still.
Is this essentially an Apple M-Series chip in concept?
Don't want to be too harsh, maybe I'm missing something, but the CPU is at least 2 years old, internally it has been a complete shitshow and that's a minor hiccup when compared to the firmware and software situation.
It's an interesting "newcomer" and the more the better but calling this a "beast" and a "game changer" is ridiculous to say the least.
Then there is the price..
128GB of unified memory is a dream come true for local LLMs. VRAM has been the ultimate bottleneck for developers.
I have a 128 GB LPDDR5X machine. It's a great workstation laptop (which is why I got it) but the memory bandwidth is just awful if you're wanting to use it for AI. An old Epyc COU will fair better both in terms of being able to run full sized larger models as well as having higher memory bandwidth, and that's not a recommendation to go that route either as it's still not worth it.
The competitor for this NVIDIA CPU will not be the now old AMD Strix Halo, but its successor (launched recently), which supports up to 192 GB of unified memory. Thus 128 GB is no longer SOTA.
While this NVIDIA system is inferior from the point of view of the memory capacity, its main advantage is that the top models will have a bigger GPU, i.e. with 6144 or 5120 FP32 execution units, compared to 2560 for the AMD GPU (compared to the NVIDIA CPU, the AMD CPU has a better multi-threaded performance for legacy programs, and a much better multi-threaded performance for the applications that use AVX-512).
However, these top models with big GPUs will also be much more expensive than the competing AMD system, while also being much more expensive than a laptop or mini-PC with an equivalent discrete NVIDIA GPU (which has the disadvantage of having direct access only to a much smaller, even if faster, memory).
I don’t think there is much improvement in compute for the new strix halo revision. The next one supposedly adds rdna4 cores or similar and more memory channels
It could help with exploding external LLM costs. Interesting to see how the adaption will be, which will mainly depend on the price.
This is what makes it interesting to me as well
[dead]
Does this person know that this is the same GB chip in the DGX Spark? It isn't some proposed thing, it's a chip loads of people have on their desk right now, and there are endless benchmarks of it.
Decent single core (a long ways from Apple level, but decent), but it makes up for it in cores to provide M5 level performance, CPU wise. Memory bandwidth it is kind of starved, at 1/6th many GPUs.
They got Microsoft to customize Windows for the RTX Spark, and will likely have to brutally throttle it when running as a laptop (it's literally a 140W TDP chip), and that's neat. It's going to be a very expensive laptop.
This is probably the better way to frame it: not "Nvidia is proposing a new CPU system" but "Nvidia is trying to move an existing GB/Spark-class platform into a Windows PC form factor"
I heard the memory bandwidth is not just slower than on a GPU, as expected, but is significantly slower than Apple’s unified memory.
CPU/GPU is decent (800 GB or so), memory is slowish (300GB or so). Some Apple M are slower, some are faster.
Where did you get those numbers from?
DGX Spark has a maximum of 273 GB/s bandwidth in ideal scenarios (hard to reach)
That puts it between an M5 (153) and M5 Pro (307)
Plus John Carmack has reviewed it, he was not amazed.
Related:
A powerful new chapter for Windows PCs, accelerated by Nvidia RTX Spark
https://news.ycombinator.com/item?id=48352693
Nvidia RTX Spark
https://news.ycombinator.com/item?id=48352939
Mediatek and Nvidia the horsemen of abandoning hardware after a year. The Jetson family still left a bad taste in my mouth.
good to know, hope the price will be affordable, having a pc becoming a luxury :)
I’m not sure if you’re aware but there is a supply chain shortage for pretty much everything needed for a PC that isn’t expected to be solved this year or next year. There is no way that can be affordable
Certainly not in the year of our lord, 2026. Maybe in a few years though.
Will it support Linux?
Are their enterprise orders slowing down? Why use precious maxed out fab capacity on consumer stuff when it could be an enterprise chip?
It already is an enterprise chip. This is about Microsoft not having the equivalent of an M3 Max or whatever laptop.
And maybe for NVIDIA and MS it is also about them quietly betting that local models are, in fact, going to be good enough for most tasks pretty soon.
It uses LPDDR5X instead of VRAM and will still sell for a premium while pushing their presence even further in every side of the AI market. This was one area AMD was ahead in and now Nvidia is probably better off making this to compete on that front while still being better off than making a 5090.
That doesn't answer the question. If the high margin enterprise GPUs are saturating the fab capacity you wouldn't expect them to be pushing this. But IIRC those all have oodles of integrated HBM at this point so I wonder if fab capacity for that has become a bottleneck.
Yeah when laptops are shipping 8Gb and Microsoft is suddenly interested in native apps, nope.
Tech companies have strangled their own market.
I am not sure how many people will run AI models locally. It still seems like a niche application to me.
I'd say this relates directly to the cost of running AI models remotely.
And we won't know what the actual cost will be until AI vendors recover the huge pile of cash they've dumped into development (plus interest).
I think it's niche now because getting the hardware to run it is expensive and the quantized models don't work as well. If those improve then it would be a no brainer to pay one off for the hardware instead of a fortune for API calls.
I am not really convinced that four bit quantisation is that bad; almost certainly six will be enough. But Google are making claims for their QAT tech in Gemma that they are surely using or testing in Gemini that it preserves nearly source model quality while reducing footprint.
The hardware for 50 tokens per second with a four bit quantisation of Gemma 4 26B or the sparse Qwen 3.6 is not really that expensive: it’s a secondhand M1 Max.
Beyond that, I agree. I think moving planning tasks to local is a now thing, not that it really has much impact on token spend. I also think many small coding tasks are fully within the grasp of the above two models.
The main issue right now is that the software landscape is rather confusing, but I reckon uncomplicated Gemma 4 26B QAT support with MTP is a few weeks away.
AI vendors are attempting to offer the whole apple. And they are spending huge sums of money in the process.
But most businesses don't really care about most of the apple --- they only need their special bite out of it.
For example, doctors mainly care about medicine. Nvidia is attempting to provide the hardware needed for local, specialized models.
I think it is likely to appeal to video and photo editors who want to use AI tools (the press release has a quote from Blackmagic Design, as well as from Adobe, who I think have no stomach for their own cloud AI).
But I don’t know about specialised: this could run quite large models with MoE.
Performances of local models are pretty bad compared to what AI vendors offer, token generation is just too slow to be that useful. And you need to allocate GBs of memories, something that will stay very expensive to buy for a long time.
Running local models will stay niche for a while, unless we see breakthroughs
Dumb idea --- how about if we limit local models to specific domains --- medicine for example.
Most doctors don't care much about engineering or accounting or software development or 10000 other things that big vendor models address.
This area is yet to be really explored. Nvidia aims to provide the hardware to do so.
[dead]
> I am not sure how many people will run AI models locally. It still seems like a niche application to me.
Bill Gates had a quote some years ago...
People have still not learned how fast we improve our tech and how much cheaper thing gets I guess :)
Memory isn’t getting cheap soon, and you need a lot of it for local models
All depends. The current technology will be cheaper in a year or two. The best cutting edge stuff will properly be even more expensive. But in 10 years time... we can run current SOTA models (or models that are equally good ) on our local hardware
Ah yes, if you count in decades, for sure I expect to run them locally
We had a thing called globalism that drastically reduced costs. Globalism right now is on life support. Given geopolitics, I don’t see how it’s going to survive.