Astro - Hacker News

17 comments

fph an hour ago

Despite the shortage, RAM is still cheaper than mathematicians.
abdelhousni an hour ago

The same could be said about other IT domain... When you see single webpages that weight by tens of MB you wonder how we came to this.
Lerc 3 hours ago

This is one of the basic avenues for advancement.
Compute, bytes of ram used, bytes in model, bytes accessed per iteration, bytes of data used for training.
You can trade the balance if you can find another way to do things, extreme quantisation is but one direction to try. KANs were aiming for more compute and fewer parameters. The recent optimisation project have been pushing at these various properties. Sometimes gains in one comes at the cost of another, but that needn't always be the case.
tornikeo 28 minutes ago

Sigh. Don't make me tap the sign [1]
[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html
amelius 21 minutes ago

Can we say something about the compression factor for pure knowledge of these models?
LoganDark 3 hours ago

We will not see memory demand decrease because this will simply allow AI companies to run more instances. They still want an infinite amount of memory at the moment, no matter how AI improves.
[-]
- jurgenburgen 2 hours ago
  
  If models become more efficient we will move more of the work to local devices instead of using SaaS models. We’re still in the mainframe era of LLM.
  [-]
  - throwatdem12311 31 minutes ago
    
    The hyperscalers do not want us running models at the edge and they will spend infinite amounts of circular fake money to ensure hardware remains prohibitively expensive forever.
    
    [-]
    
    Imustaskforhelp 7 minutes ago
    
    > of circular fake money
    Oh it gets worse than that, the money which caused all of this by OpenAI was taken from Japanese banks at cheap interest rates (by softbank for the stargate project), and the Japanese Banks are able to do it because of Japanese people/Japanese companies and also the collateral are stocks which are inflated by the value of people who invest their hard earned money into the markets
    So in a way they are using real hard earned money to fund all of this, they are using your money to basically attack you behind your backs.
    I once wrote an really long comment about the shaky finances of stargate, I feel like suggesting it here: https://news.ycombinator.com/item?id=47297428
  - Ray20 16 minutes ago
    
    > If models become more efficient
    Then we can make them even bigger.
    
    [-]
    
    Imustaskforhelp 12 minutes ago
    
    > Then we can make them even bigger.
    But what if it becomes "good enough", that for most intents and purposes, small models can be "good enough"
    There are some people here/on r/localllama who I have seen run some small models and sometimes even run multiple of them to solve/iterate quickly and have a larger model plug into it and fix anything remaining.
    This would still mean that larger/SOTA models might have some demand but I don't think that the demand would be nearly enough that people think, I mean, we all still kind of feel like there are different models which are good for different tasks and a good recommendation is to benchmark different models for your own use cases as sometimes there are some small models who can be good within your particular domain worth having within your toolset.
  - ssyhape an hour ago
    
    The mainframe analogy is close but I think the key difference is that mainframe->PC was driven by hardware getting cheap, while LLM efficiency needs algorithmic breakthroughs which are way less predictable. My bet is we get a split: anything latency-sensitive (code completion, local assistants) goes to edge as soon as models fit on consumer hardware, because you can't cheat physics on network round trips. But training and heavy reasoning stays centralized -- the data gravity just gets worse as models improve. Also I keep going back and forth on whether stuff like MoE and speculative decoding is "better math" or just "better engineering." Feels like an important distinction since they have very different cost curves.
  - DeathArrow 2 hours ago
    
    I don't think we are there yet. Models running in data centers will still be noticeably better as efficiency will allow them to build and run better models.
    Not many people would like today models comparable to what was SOTA 2 years ago.
    To run models locally and have results as good as the models running in data centers we need both efficiency and to hit a wall in AI improvement.
    None of those two conditions seem to become true for the near future.
- redrove 2 hours ago
  
  I disagree. I think a sharp drop in memory requirements of at least an order of magnitude will cause demand to adjust accordingly.
  [-]
  - cyanydeez 19 minutes ago
    
    Department of Transportation always thinks adding more lanes will reduce traffic.
    It doesn't, it induces demand. Why? Because there's always too many people with cars who will fill those lanes.
    
    [-]
    
    nkmnz 11 minutes ago
    
    Citation needed. I've heard this quite often, but so far, I haven't seen proof of the stated causality.
    PS: This doesn't mean that better public transportation could deliver more bang for the buck than the n-th additional car lane. But never ever have I heard from anybody that they chose to buy a car or use an existing car more often because an additional lane has been built.
- jLaForest 42 minutes ago
  
  Jevons paradox https://en.wikipedia.org/wiki/Jevons_paradox