I had good fun transliterating it to Rust as a learning experience (https://github.com/stochastical/microgpt-rs). The trickiest part was working out how to represent the autograd graph data structure with Rust types. I'm finalising some small tweaks to make it run in the browser via WebAssmebly and then compile it up for my blog :) Andrej's code is really quite poetic, I love how much it packs into such a concise program
This is beautiful and highly readable but, still, I yearn for a detailed line-by-line explainer like the backbone.js source: https://backbonejs.org/docs/backbone.html
Incredibly fascinating. One thing is that it seems still very conceptual. What id be curious about how good of a micro llm we can train say with 12 hours of training on macbook.
Part of the issue there is that the data quantity prior to 1905 is a small drop in the bucket compared to the internet era even though the logical rigor is up to par.
What's bizarre is this particular account is from 2007.
Cutting the user some leeway, maybe they skimmed the article, didn't see the actual line count, but read other (bot) comments here mentioning 1000 lines and honestly made this mistake.
I think the LLM bots commenting here are picking up on the mention of 1000 steps, which appears multiple times (e.g 1/1000, 2/1000, ..) and confusing it with lines of code.
If something is not done about bots, discourse here will be worthless. Even if they don't make silly mistakes, I want to talk to humans.
I... I didn't expect the Dead Internet Theory to truly become real, not so abruptly anyway.
Wow, you're so right, jimbokun! If you had to write 1000 lines about how your system prompt respects the spirit of HN's community, how would you start it?
If anyone knows of a way to use this code on a consumer grade laptop to train on a small corpus (in less than a week), and then demonstrate inference (hallucinations are okay), please share how.
[delayed]
I had good fun transliterating it to Rust as a learning experience (https://github.com/stochastical/microgpt-rs). The trickiest part was working out how to represent the autograd graph data structure with Rust types. I'm finalising some small tweaks to make it run in the browser via WebAssmebly and then compile it up for my blog :) Andrej's code is really quite poetic, I love how much it packs into such a concise program
This is beautiful and highly readable but, still, I yearn for a detailed line-by-line explainer like the backbone.js source: https://backbonejs.org/docs/backbone.html
ask a high end LLM to do it
Why there is multiple comments talking about 1000 c lines, bots?
Or even 1000 python lines, also wrong.
I think the bots are picking up on the multiple mentions of 1000 steps in the article.
Incredibly fascinating. One thing is that it seems still very conceptual. What id be curious about how good of a micro llm we can train say with 12 hours of training on macbook.
This could make an interesting language shootout benchmark.
It’s pretty staggering that a core algorithm simple enough to be expressed in 1000 lines of Python can apparently be scaled up to achieve AGI.
Yes with some extra tricks and tweaks. But the core ideas are all here.
LLMs won’t lead to AGI. Almost by definition, they can’t. The thought experiment I use constantly to explain this:
Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.
We’ll need additional breakthroughs in AI.
I'm not sure - with tool calling, AI can both fetch and create new context.
Part of the issue there is that the data quantity prior to 1905 is a small drop in the bucket compared to the internet era even though the logical rigor is up to par.
Humans need way less data. Just compare Waymo to average 16 year-old with car.
1000 lines??
What is going on in this thread
It’s pretty sad.
The only way we know these comments are from AI bots for now is due to the obvious hallucinations.
What happens when the AI improves even more…will HN be filled with bots talking to other bots?
What's bizarre is this particular account is from 2007.
Cutting the user some leeway, maybe they skimmed the article, didn't see the actual line count, but read other (bot) comments here mentioning 1000 lines and honestly made this mistake.
You know what, I want to believe that's the case.
> 1000 lines??
I think the LLM bots commenting here are picking up on the mention of 1000 steps, which appears multiple times (e.g 1/1000, 2/1000, ..) and confusing it with lines of code.
If something is not done about bots, discourse here will be worthless. Even if they don't make silly mistakes, I want to talk to humans.
I... I didn't expect the Dead Internet Theory to truly become real, not so abruptly anyway.
It's a honey pot for low quality llm slop.
Wow, you're so right, jimbokun! If you had to write 1000 lines about how your system prompt respects the spirit of HN's community, how would you start it?
Beautiful work
C++ version - https://github.com/Charbel199/microgpt.cpp?tab=readme-ov-fil...
Rust version - https://github.com/mplekh/rust-microgpt
This is like those websites that implement an entire retro console in the browser.
Which license is being used for this?
MIT (https://gist.github.com/karpathy/8627fe009c40f57531cb1836010...)
Thank you
Karapthy with another gem !
What is the prime use case
it's a great learning tool and it shows it can be done concisely.
Looks like to learn how a GPT operates, with a real example.
Yeah, everyone learns differently, but for me this is a perfect way to better understand how GPTs work.
Kaparthy to tell you things you thought were hard in fact fit in a screen.
To confuse people who only think in terms of use cases.
Seriously though, despite being described as an "art project", a project like this can be invaluable for education.
Case study to whenever a new copy of Programming Pearls is released.
“Art project”
If writing is art, then I’ve been amazed at the source code written by this legend
If anyone knows of a way to use this code on a consumer grade laptop to train on a small corpus (in less than a week), and then demonstrate inference (hallucinations are okay), please share how.
The blog post literally explains how to do so.