I really really want this to be true. I want to be relevant. I don’t know what to do if all those predictions are true and there is no need (or very little need) for programmers anymore.
But something tells me “this time is different” is different this time for real.
Coding AIs design software better than me, review code better than me, find hard-to-find bugs better than me, plan long-running projects better than me, make decisions based on research, literature, and also the state of our projects better than me. I’m basically just the conductor of all those processes.
Oh, and don't ask about coding. If you use AI for tasks above, as a result you'll get very well defined coding task definitions which an AI would ace.
I’m still hired, but I feel like I’m doing the work of an entire org that used to need twenty engineers.
However I'm still finding a trend even in my org; better non-AI developers tend to be better at using AI to develop.
AI still forgets requirements.
I'm currently running an experiment where I try to get a design and then execute on an enterprise 'SAAS-replacement' application [0].
AI can spit forth a completely convincing looking overall project plan [1] that has gaps if anyone, even the AI itself, tries to execute on the plan; this is where a proper, experienced developer can step in at the right steps to help out.
IDK if that's the right way to venture into the brave new world, but I am at least doing my best to be at a forefront of how my org is using the tech.
[0] - I figured it was a good exercise for testing limits of both my skills prompting and the AI's capability. I do not expect success.
I was a chef in Michelin-starred restaurants for 11 years. One of my favorite positions was washing dishes. The goal was always to keep the machine running on its 5-minute cycle. It was about getting the dishes into racks, rinsing them, and having them ready and waiting for the previous cycle to end—so you could push them into the machine immediately—then getting them dried and put away after the cycle, making sure the quality was there and no spot was missed. If the machine stopped, the goal was to get another batch into it, putting everything else on hold. Keeping the machine running was the only way to prevent dishes from piling up, which would end with the towers falling over and breaking plates. This work requires moving lightning fast with dexterity.
AI coding agents are analogous to the machine. My job is to get the prompts written, and to do quality control and housekeeping after it runs a cycle. Nonetheless, like all automation, humans are still needed... for now.
This reads like shilling/advertisement.. Coding AIs are struggling for anything remotely complex, make up crap and present it as research, write tests that are just "return true", and won't ever question a decision you make.
Those twenty engineers must not have produced much.
No it doesn’t read like shilling and advertisement, it’s tiring hearing people continually dismiss coding agents as if they have not massively improved and are driving real value despite limitations and they are only just getting started. I’ve done things with Claude I never thought possible for myself to do, and I’ve done things where Claude made the whole effort take twice as long and 3x more of my time. It’s not like people are ignoring the limitations, it’s that people can see how powerful the already are and how much more headroom there is even with existing paradigms not to mention the compute scaling happening in 26-27 and the idea pipeline from the massive hoarding of talent.
When prices go down or product velocity goes up we'll start believing in the new 20x developer. Until then, it doesn't align with most experiences and just reads like fiction.
You'll notice no one ever seems to talk about the products they're making 20x faster or cheaper.
AI boosters? Like people are planted by Sam Altman like the way they hire crowds for political events or something? Hey! Maybe I’m AI! You’re absolutely right!
In seriousness: I’m sure there are projects that are heavily powered by Claude, myself and a lot of other people I know use Claude almost exclusively to write and then leverage it as a tool when reviewing. Almost everyone I hear that has this super negative hostile attitude references some “promise” that has gone unfulfilled but it’s so silly: judge the product they are producing and maybe just maybe consider the rate of progress to _guess_ where things are heading
> I’ve done things with Claude I never thought possible for myself to do,
That's the point champ. They seem great to people when they apply them to some domain they are not competent it, that's because they cannot evaluate the issues. So you've never programmed but can now scaffold a React application and basic backend in a couple of hours? Good for you, but for the love of god have someone more experienced check it before you push into production. Once you apply them to any area where you have at least moderate competence, you will see all sorts of issues that you just cannot unsee. Security and performance is often an issue, not to mention the quality of code....
This is remarkable dismissive and arrogant. In reality they assist many people in getting things done in areas they are competent in, without getting bogged down in tedium.
They need a heavy hand to police to make sure they do the right thing. Garbage in, garbage out.
The smarter the hand of the person driving them, the better the output. You see a problem, you correct it. Or make them correct it.
It's basically the opposite of what you're asserting here.
Seems fine, works, is fine, is better than if you had me go off and write it on my own. You realize you can check the results? You can use Claude to help you understand the changes as you read through them? I mean I just don’t get this weird “it makes mistakes and it’s horrible if you understand the domain that it is generating over” I mean yes definitely sometimes and definitely not other times. What happens if I DONT have someone more experienced to consult with or that will ignore me because they are busy or be wrong because they are also imperfect and not focused. It’s really hard to be convinced that this point of view is not just some knee jerk reaction justified post hoc
I would say while LLMs do improve productivity sometimes, I have to say I flatly cannot believe a claim (at least without direct demonstration or evidence) that one person is doing the work of 20 with them in december 2025 at least.
I mean from the off, people were claiming 10x probably mostly because it's a nice round number, but those claims quickly fell out of the mainstream as people realised it's just not that big a multiplier in practice in the real world.
I don't think we're seeing this in the market, anywhere. Something like 1 engineer doing the job of 20, what you're talking about is basically whole departments at mid sized companies compressing to one person. Think about that, that has implications for all the additional management staff on top of the 20 engineers too.
It'd either be a complete restructure and rethink of the way software orgs work, or we'd be seeing just incredible, crazy deltas in output of software companies this year of the type that couldn't be ignored, they'd be impossible to not notice.
This is just plainly not happening. Look, if it happens, it happens, 26, 27, 28 or 38. It'll be a cool and interesting new world if it does. But it's just... not happened or happening in 25.
I think I've been using AI wrong. I can't understand testimonies like this. Most times I try to use AI for a task, it is a shitshow, and I have to rewrite everything anyway.
I don’t know about right/wrong. You need to use the tools that make you productive. I personally find that in my work there are dozens of little scripts or helper functions that accelerate my work. However I usually don’t write them because I don’t have the time. AI can generate these little scripts very consistently. That accelerates my work. Perhaps just start simple.
They do all those things you've mentioned more efficiently than most of us, but they fall woefully short as soon as novelty is required. Creativity is not in their repertoire. So if you're banging out the same type of thing over and over again, yes, they will make that work light and then scarce. But if you need to create something niche, something one-off, something new, they'll slip off the bleeding edge into the comfortable valley of the familiar at every step.
I choose to look at it as an opportunity to spend more time on the interesting problems, and work at a higher level. We used to worry about pointers and memory allocation. Now we will worry less and less about how the code is written and more about the result it built.
Take food for example. We don't eat food made by computers even though they're capable of making it from start to finish.
Sure we eat carrots probably assisted by machines, but we are not eating dishes like protein bars all day every day.
Our food is still better enjoyed when made by a chef.
Software engineering will be the same. No one will want to use software made by a machine all day every day. There are differences in the execution and implementation.
No one will want to read books entirely dreamed up by AI. Subtle parts of the books make us feel something only a human could have put right there right then.
No one will want to see movies entirely made by AI.
The list goes on.
But you might say "software is different". Yes but no, in the abundance of choice, when there will be a ton of choice for a type of software due to the productivity increase, choice will become more prominent and the human driven software will win.
Even today we pick the best terminal emulation software because we notice the difference between exquisitely crafted and bloated cruft.
> So if you're banging out the same type of thing over and over again, yes, they will make that work light and then scarce.
The same thing over and over again should be a SaaS, some internal tool, or a plugin. Computers are good at doing the same thing over and over again and that's what we've been using them for
> But if you need to create something niche, something one-off, something new, they'll slip off the bleeding edge into the comfortable valley of the familiar at every step.
Even if the high level description of a task may be similar to another, there's always something different in the implementation. A sports car and a sedan have roughly the same components, but they're not engineered the same.
> We used to worry about pointers and memory allocation.
Some still do. It's not in every case you will have a system that handle allocations and a garbage collector. And even in those, you will see memory leaks.
> Now we will worry less and less about how the code is written and more about the result it built.
I think your image of LLMs is a bit outdated. Claude Code with well-configured agents will get entirely novel stuff done pretty well, and that’s only going to get better over time.
Perfect economic substitution in coding doesn't happen for a long time. Meanwhile, AI appears as an amplifier to the human and vice versa. That the work will change is scary, but the change also opens up possibilities, many of them now hard to imagine.
I feel you, it's scary. But the possibilities we're presented with are incredible. I'm revisiting all these projects that I put aside because they were "too big" or "too much for a machine". It's quite exciting
Stop freaking out. Seriously. You're afraid of something completely ridiculous.
It is certainly more eloquent than you regarding software architecture (which was a scam all along, but conversation for another time).
It will find SOME bugs better than you, that's a given.
Review code better than you? Seriously? What you're using and what you consider code review?
Assume I could identify one change broke production and you reviewed the latest commit. I am pinging you and you better answer. Ok, Claude broke production, now what?
Can you begin to understand the difference between you and the generative technology?
When you hop on the call, you will explain to me with a great deal of details what you know about the system you built, and explain decision making and changes over time. You'll tell about what worked and what didn't. You will tell about the risks, behavior and expectations. About where the code runs, it's dependencies, users, usage patterns, load, CPU usage and memory footprint, you could probably tell what's happening without looking at logs but at metrics.
With Claude I get: you're absolutely right! You asked about what it WAS, but I told you about what it WASN'T! MY BAD.
Knowledge requires a soul to experience and this is why you're paid.
Not because the models are random, but because you are mistaking a massive combinatorial search over seen patterns for genuine reasoning. Taleb point was about confusing luck for skill. Dont confuse interpolation for understanding.
You can read a Rust book after years of Java, then go build software for an industry that did not exist when you started. Ask any LLM to write a driver for hardware that shipped last month, or model a regulatory framework that just passed... It will confidently hallucinate. You will figure it out. That is the difference between pattern matching and understanding.
I've worked with a lot of interns, fresh outs from college, overseas lowest bidders, and mediocre engineers who gave up years ago. All over the course of a ~20 year career.
Not once in all that time has anyone PRed and merged my completely unrelated and unfinished branch into main. Except a few weeks ago. By someone who was using the LLM to make PRs.
He didn't understand when I asked him about it and was baffled as to how it happened.
Really annoying, but I got significantly less concerned about the future of human software engineering after that.
Have you used an LLM specifically trained for tool calling, in Claude Code, Cursor or Aider?
They’re capable of looking up documentation, correcting their errors by compiling and running tests, and when coupled with a linter, hallucinations are a non issue.
I don’t really think it’s possible to dismiss a model that’s been trained with reinforcement learning for both reasoning and tool usage as only doing pattern matching. They’re not at all the same beasts as the old style of LLMs based purely on next token prediction of massive scrapes of web data (with some fine tuning on Q&A pairs and RLHF to pick the best answers).
I'm using Claude code to help me learn Godot game programming.
One interesting thing is that Claude will not tell me if I'm following the wrong path. It will just make the requested change to the best of its ability.
For example a Tower Defence game I'm making I wanted to keep turret position state in an AStarGrid2D. It produced code to do this, but became harder and harder to follow as I went on. It's only after watching more tutorials I figured out I was asking for the wrong thing. (TileMapLayer is a much better choice)
edit: Major engine changes have occurred after the models were trained, so you will often be given code that refers to nonexistent constants and functions and which is not aware of useful new features.
before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?
after coding I ask it "review the code, do you see any for which there are common libraries implementing it? are there ways to make it more idiomatic?"
you can also ask it "this is an idea on how to solve it that somebody told me, what do you think about it, are there better ways?"
> before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?
Just for the fun of it, and so you lose your "virginity" so to speak, next time when the magic machine gives you the answer about "what it thinks", tell it its wrong in a strict language and scold it for misleading you. Tell it to give you the "real" best practices instead of what it spat out.
Then sit back and marvel at the machine saying you were right and that it had mislead you. Producing a completely, somewhat, or slightly different answer (you never know what you get on the slot machine).
Both the before and after are better done manually. What you are describing is fine for the heck of it (I‘ve vibe coded a whisper related rust port today without having any actual rust skills), but I’d never use fully vibed software in production. That’s irresponsible in multiple ways.
I‘ve just tried the dxastgraphx one in pi with Opus 4.5. This was its response:
I couldn't find a library called dxastgraphx in either pip (Python) or npm (JavaScript) package registries. This library doesn't appear to exist.
Did you perhaps mean one of these popular DAG/graph libraries?
Python:
- networkx - comprehensive graph library with DAG support
- graphlib - Python standard library (3.9+) with TopologicalSorter
- dask - parallel computing with DAG task scheduling
JavaScript/TypeScript:
- graphlib - graph data structures
- dagre - DAG layout
Would you like me to build a DAG scheduler using one of these existing libraries, or would you like me to implement one from scratch? Let me know which language and approach you prefer.
Yeah, it makes me wonder whether I should start learning to be a carpenter or something. Those who either support AI or thinks "it's all bullshit" cite a lack of evidence for humans truly being replaced in the engineering process, but that's just the thing; the unprecedented levels of uncertainty make it very difficult to invest one's self in the present, intellectually and emotionally. With the current state of things, I don't think it's silly to wonder "what's the point" if another 5 years of this trajectory is going to mean not getting hired as a software dev again unless you have a PhD and want to work for an AI company.
What doesn't help is that the current state of AI adoption is heavily top-down. What I mean is the buy-in is coming from the leadership class and the shareholder class, both of whom have the incentive to remove the necessary evil of human beings from their processes. Ironically, these classes are perhaps the least qualified to decide whether generative AI can replace swathes of their workforce without serious unforeseen consequences. To make matters worse, those consequences might be as distal as too many NEETs in the system such that no one can afford to buy their crap anymore; good luck getting anyone focused on making it to the next financial quarter to give a shit about that. And that's really all that matters at the end of the day; what leadership believes, whether or not they are in touch with reality.
Where the hell was all this fear when the push for open source everything got fully underway? When entire websites were being spawned and scaffolded with just a couple lines of code? Do we not remember all those impressive tech demos of developers doing massive complex thing with "just one line of code"? How did we not just write software for every kind of software problem that could exist by now?
How has free code, developed by humans, become more available than ever and yet somehow we have had to employ more and more developers? Why didn't we trend toward less developers?
It just doesn't make sense. AI is nothing but a snippet generator, a static analyzer, a linter, a compiler, an LSP, a google search, a copy paste from stackoverflow, all technologies we've had for a long time, all things developers used to have to go without at some point in history.
In aviation safety, there is a concept of "Swiss cheese" model, where each successful layer of safety may not be 100% perfect, but has a different set of holes, so overlapping layers create a net gain in safety metrics.
One can treat current LLMs as a layer of "cheese" for any software development or deployment pipeline, so the goal of adding them should be an improvement for a measurable metric (code quality, uptime, development cost, successful transactions, etc).
Of course, one has to understand the chosen LLM behaviour for each specific scenario - are they like Swiss cheese (small numbers of large holes) or more like Havarti cheese (large number of small holes), and treat them accordingly.
> One can treat current LLMs as a layer of "cheese" for any software development or deployment pipeline
It's another interesting attempt at normalising the bullshit output by LLMs, but NO. Even if the entshittified Boeing, the aviation industry safety and reliability records, are far far far above deterministic software (know for a lot of un-reliability itself), and deterministic, B2C software to LLMs in turn is what Boeing and Airbus software and hardware reliablity are for the B2C software...So you cannot even begin to apply aviation industry paradigms to the shit machines, please.
Interesting concept, but as of now we don't apply this technologies as a new compounding layer.
We are not using them after the fact we constructed the initial solution. We are not ingesting the code to compare against specs. We are not using them to curate and analyze current hand written tests(prompt: is this test any good? assistant: it is hot garbage, you are inferring that expected result equals your mocked result).
We are not really at this phase yet. Not in general, not intelligently.
But when the "safe and effective" crowd leave technology we will find good use cases for it, I am certain (unlike uml, VB and Delphi)
LLMs are Kraft Singles. Stuff that only kind of looks like cheese. Once you know it's in there, someone has to inspect, and sign-off on, the entire wheel for any credible semblance of safety.
> The hard part of computer programming isn't expressing what we want the machine to do in code. The hard part is turning human thinking -- with all its wooliness and ambiguity and contradictions -- into computational thinking that is logically precise and unambiguous, and that can then be expressed formally in the syntax of a programming language.
> That was the hard part when programmers were punching holes in cards. It was the hard part when they were typing COBOL code. It was the hard part when they were bringing Visual Basic GUIs to life (presumably to track the killer's IP address). And it's the hard part when they're prompting language models to predict plausible-looking Python.
> The hard part has always been – and likely will continue to be for many years to come – knowing exactly what to ask for.
I don't agree with this:
> To folks who say this technology isn’t going anywhere, I would remind them of just how expensive these models are to build and what massive losses they’re incurring. Yes, you could carry on using your local instance of some small model distilled from a hyper-scale model trained today. But as the years roll by, you may find not being able to move on from the programming language and library versions it was trained on a tad constraining.
Some of the best Chinese models (which are genuinely competitive with the frontier models from OpenAI / Anthropic / Gemini) claim to have been trained for single-digit millions of dollars. I'm not at all worried that the bubble will burst and new models will stop being trained and the existing ones will lose their utility - I think what we have now is a permanent baseline for what will be available in the future.
maybe not the MOST valuable part of prompting an LLM during a task, but one of them, is defining the exact problem in precise language. i dont just blindly turn to an LLM without understanding the problem first, but i do find Claude is better than a cardboard cutout of a dog
The first part is surely true if you change it to "the hardEST part," (I'm a huge believer in "Programming as Theory Building"), but there are plenty of other hard or just downright tedious/expensive aspects of software development. I'm still not fully bought in on some of the AI stuff—I haven't had a chance to really apply an agentic flow to anything professional, I pretty much always get errors even when one-shotting, and who knows if even the productive stuff is big-picture economical—but I've already done some professional "mini projects" that just would not have gotten done without an AI. Simple example is I converted a C# UI to Java Swing in less than a day, few thousand lines of code, simple utility but important to my current project for <reasons>. Assuming tasks like these can be done economically over time, I don't see any reason why small and medium difficulty programming tasks can't be achieved efficiently with these tools.
Indeed, while DeepSeek 3.2 or GLM 4.7 are not Opus 4.5 quality, they are close enough that I could _get by_ because they're not that far off, and are about where I was with Sonnet 3.5 or Sonnet 4 a few months ago.
I'm not convinced DeepSeek is making money hosting these, but it's not that far off from it I suspect. They could triple their prices and still be cheaper than Anthropic is now.
There is a guaranteed cap on how far LLM based AI models can go. Models improve by being trained on better data. LLMs being used to generate millions of lines of sloppy code will substantially dilute the pool of good training data. Developers moving over to AI based development will cease to grow and learn - producing less novel code.
The massive increase in slop code and loss of innovation in code will establish an unavoidable limit on LLMs.
That is a naive assumption. Or rather multiple naive assumptions: Developers mostly don’t move over to AI development, but integrate it into their workflow. Many of them will stay intellectually curious and thus focus their attention elsewhere; I’m not convinced they will just suddenly all stagnate.
Also, training data isn’t just crawled text from the internet anymore, but also sourced from interactions of millions of developers with coding agents, manually provided sample sessions, deliberately generated code, and more—there is a massive amount of money and research involved here, so that’s another bet I wouldn’t be willing to make.
But they're not just training off code and its use, but off a corpus general human knowledge in written form.
I mean, in general not only do they have all of the crappy PHP code in existence in their corpus but they also have Principia Mathematica, or probably The Art of Computer Programming. And it has become increasingly clear to me that the models have bridged the gap between "autocomplete based on code I've seen" to some sort of distillation of first order logic based on them just reading a lot of language... and some fuzzy attempt at reasoning that came out of it.
Plus the agentic tools driving them are increasingly ruthless at wringing out good results.
That said -- I think there is a natural cap on what they can get at as pure coding machines. They're pretty much there IMHO. The results are usually -- I get what I asked for, almost 100%, and it tends to "just do the right thing."
I think the next step is actually to actually make it scale and make it profitable but also...
fix the tools -- they're not what I want as an engineer. They try to take over, and they don't put me in control, and they create a very difficult review and maintenance problem. Not because they make bad code but because they make code that nobody feels responsible for.
I really really want this to be true. I want to be relevant. I don’t know what to do if all those predictions are true and there is no need (or very little need) for programmers anymore.
But something tells me “this time is different” is different this time for real.
Coding AIs design software better than me, review code better than me, find hard-to-find bugs better than me, plan long-running projects better than me, make decisions based on research, literature, and also the state of our projects better than me. I’m basically just the conductor of all those processes.
Oh, and don't ask about coding. If you use AI for tasks above, as a result you'll get very well defined coding task definitions which an AI would ace.
I’m still hired, but I feel like I’m doing the work of an entire org that used to need twenty engineers.
From where I’m standing, it’s scary.
It's definitely scary in a way.
However I'm still finding a trend even in my org; better non-AI developers tend to be better at using AI to develop.
AI still forgets requirements.
I'm currently running an experiment where I try to get a design and then execute on an enterprise 'SAAS-replacement' application [0].
AI can spit forth a completely convincing looking overall project plan [1] that has gaps if anyone, even the AI itself, tries to execute on the plan; this is where a proper, experienced developer can step in at the right steps to help out.
IDK if that's the right way to venture into the brave new world, but I am at least doing my best to be at a forefront of how my org is using the tech.
[0] - I figured it was a good exercise for testing limits of both my skills prompting and the AI's capability. I do not expect success.
I was a chef in Michelin-starred restaurants for 11 years. One of my favorite positions was washing dishes. The goal was always to keep the machine running on its 5-minute cycle. It was about getting the dishes into racks, rinsing them, and having them ready and waiting for the previous cycle to end—so you could push them into the machine immediately—then getting them dried and put away after the cycle, making sure the quality was there and no spot was missed. If the machine stopped, the goal was to get another batch into it, putting everything else on hold. Keeping the machine running was the only way to prevent dishes from piling up, which would end with the towers falling over and breaking plates. This work requires moving lightning fast with dexterity.
AI coding agents are analogous to the machine. My job is to get the prompts written, and to do quality control and housekeeping after it runs a cycle. Nonetheless, like all automation, humans are still needed... for now.
This reads like shilling/advertisement.. Coding AIs are struggling for anything remotely complex, make up crap and present it as research, write tests that are just "return true", and won't ever question a decision you make.
Those twenty engineers must not have produced much.
No it doesn’t read like shilling and advertisement, it’s tiring hearing people continually dismiss coding agents as if they have not massively improved and are driving real value despite limitations and they are only just getting started. I’ve done things with Claude I never thought possible for myself to do, and I’ve done things where Claude made the whole effort take twice as long and 3x more of my time. It’s not like people are ignoring the limitations, it’s that people can see how powerful the already are and how much more headroom there is even with existing paradigms not to mention the compute scaling happening in 26-27 and the idea pipeline from the massive hoarding of talent.
When prices go down or product velocity goes up we'll start believing in the new 20x developer. Until then, it doesn't align with most experiences and just reads like fiction.
You'll notice no one ever seems to talk about the products they're making 20x faster or cheaper.
+1 - I wish at least one of these AI boosters had shown us a real commercialised product they've built.
AI boosters? Like people are planted by Sam Altman like the way they hire crowds for political events or something? Hey! Maybe I’m AI! You’re absolutely right!
In seriousness: I’m sure there are projects that are heavily powered by Claude, myself and a lot of other people I know use Claude almost exclusively to write and then leverage it as a tool when reviewing. Almost everyone I hear that has this super negative hostile attitude references some “promise” that has gone unfulfilled but it’s so silly: judge the product they are producing and maybe just maybe consider the rate of progress to _guess_ where things are heading
> I’ve done things with Claude I never thought possible for myself to do,
That's the point champ. They seem great to people when they apply them to some domain they are not competent it, that's because they cannot evaluate the issues. So you've never programmed but can now scaffold a React application and basic backend in a couple of hours? Good for you, but for the love of god have someone more experienced check it before you push into production. Once you apply them to any area where you have at least moderate competence, you will see all sorts of issues that you just cannot unsee. Security and performance is often an issue, not to mention the quality of code....
This is remarkable dismissive and arrogant. In reality they assist many people in getting things done in areas they are competent in, without getting bogged down in tedium.
They need a heavy hand to police to make sure they do the right thing. Garbage in, garbage out.
The smarter the hand of the person driving them, the better the output. You see a problem, you correct it. Or make them correct it.
It's basically the opposite of what you're asserting here.
Seems fine, works, is fine, is better than if you had me go off and write it on my own. You realize you can check the results? You can use Claude to help you understand the changes as you read through them? I mean I just don’t get this weird “it makes mistakes and it’s horrible if you understand the domain that it is generating over” I mean yes definitely sometimes and definitely not other times. What happens if I DONT have someone more experienced to consult with or that will ignore me because they are busy or be wrong because they are also imperfect and not focused. It’s really hard to be convinced that this point of view is not just some knee jerk reaction justified post hoc
I would say while LLMs do improve productivity sometimes, I have to say I flatly cannot believe a claim (at least without direct demonstration or evidence) that one person is doing the work of 20 with them in december 2025 at least.
I mean from the off, people were claiming 10x probably mostly because it's a nice round number, but those claims quickly fell out of the mainstream as people realised it's just not that big a multiplier in practice in the real world.
I don't think we're seeing this in the market, anywhere. Something like 1 engineer doing the job of 20, what you're talking about is basically whole departments at mid sized companies compressing to one person. Think about that, that has implications for all the additional management staff on top of the 20 engineers too.
It'd either be a complete restructure and rethink of the way software orgs work, or we'd be seeing just incredible, crazy deltas in output of software companies this year of the type that couldn't be ignored, they'd be impossible to not notice.
This is just plainly not happening. Look, if it happens, it happens, 26, 27, 28 or 38. It'll be a cool and interesting new world if it does. But it's just... not happened or happening in 25.
I think I've been using AI wrong. I can't understand testimonies like this. Most times I try to use AI for a task, it is a shitshow, and I have to rewrite everything anyway.
I don’t know about right/wrong. You need to use the tools that make you productive. I personally find that in my work there are dozens of little scripts or helper functions that accelerate my work. However I usually don’t write them because I don’t have the time. AI can generate these little scripts very consistently. That accelerates my work. Perhaps just start simple.
Same. Seems to be the never ending theme of AI.
My experience with these tools is far and away no where close to this.
If you're really able to do the work of a 20 man org on your own, start a business.
Just have your engineers pick up some product work. Clients do NOT want to talk to bots.
They do all those things you've mentioned more efficiently than most of us, but they fall woefully short as soon as novelty is required. Creativity is not in their repertoire. So if you're banging out the same type of thing over and over again, yes, they will make that work light and then scarce. But if you need to create something niche, something one-off, something new, they'll slip off the bleeding edge into the comfortable valley of the familiar at every step.
I choose to look at it as an opportunity to spend more time on the interesting problems, and work at a higher level. We used to worry about pointers and memory allocation. Now we will worry less and less about how the code is written and more about the result it built.
Take food for example. We don't eat food made by computers even though they're capable of making it from start to finish.
Sure we eat carrots probably assisted by machines, but we are not eating dishes like protein bars all day every day.
Our food is still better enjoyed when made by a chef.
Software engineering will be the same. No one will want to use software made by a machine all day every day. There are differences in the execution and implementation.
No one will want to read books entirely dreamed up by AI. Subtle parts of the books make us feel something only a human could have put right there right then.
No one will want to see movies entirely made by AI.
The list goes on.
But you might say "software is different". Yes but no, in the abundance of choice, when there will be a ton of choice for a type of software due to the productivity increase, choice will become more prominent and the human driven software will win.
Even today we pick the best terminal emulation software because we notice the difference between exquisitely crafted and bloated cruft.
> So if you're banging out the same type of thing over and over again, yes, they will make that work light and then scarce.
The same thing over and over again should be a SaaS, some internal tool, or a plugin. Computers are good at doing the same thing over and over again and that's what we've been using them for
> But if you need to create something niche, something one-off, something new, they'll slip off the bleeding edge into the comfortable valley of the familiar at every step.
Even if the high level description of a task may be similar to another, there's always something different in the implementation. A sports car and a sedan have roughly the same components, but they're not engineered the same.
> We used to worry about pointers and memory allocation.
Some still do. It's not in every case you will have a system that handle allocations and a garbage collector. And even in those, you will see memory leaks.
> Now we will worry less and less about how the code is written and more about the result it built.
Wasn't that Dreamweaver?
I think your image of LLMs is a bit outdated. Claude Code with well-configured agents will get entirely novel stuff done pretty well, and that’s only going to get better over time.
I wouldn’t want to bet my career on that anyway.
Perfect economic substitution in coding doesn't happen for a long time. Meanwhile, AI appears as an amplifier to the human and vice versa. That the work will change is scary, but the change also opens up possibilities, many of them now hard to imagine.
I feel you, it's scary. But the possibilities we're presented with are incredible. I'm revisiting all these projects that I put aside because they were "too big" or "too much for a machine". It's quite exciting
Stop freaking out. Seriously. You're afraid of something completely ridiculous.
It is certainly more eloquent than you regarding software architecture (which was a scam all along, but conversation for another time). It will find SOME bugs better than you, that's a given.
Review code better than you? Seriously? What you're using and what you consider code review? Assume I could identify one change broke production and you reviewed the latest commit. I am pinging you and you better answer. Ok, Claude broke production, now what? Can you begin to understand the difference between you and the generative technology? When you hop on the call, you will explain to me with a great deal of details what you know about the system you built, and explain decision making and changes over time. You'll tell about what worked and what didn't. You will tell about the risks, behavior and expectations. About where the code runs, it's dependencies, users, usage patterns, load, CPU usage and memory footprint, you could probably tell what's happening without looking at logs but at metrics. With Claude I get: you're absolutely right! You asked about what it WAS, but I told you about what it WASN'T! MY BAD.
Knowledge requires a soul to experience and this is why you're paid.
>> From where I’m standing, it’s scary.
You are being fooled by randomness [1]
Not because the models are random, but because you are mistaking a massive combinatorial search over seen patterns for genuine reasoning. Taleb point was about confusing luck for skill. Dont confuse interpolation for understanding.
You can read a Rust book after years of Java, then go build software for an industry that did not exist when you started. Ask any LLM to write a driver for hardware that shipped last month, or model a regulatory framework that just passed... It will confidently hallucinate. You will figure it out. That is the difference between pattern matching and understanding.
[1] https://en.wikipedia.org/wiki/Fooled_by_Randomness
I've worked with a lot of interns, fresh outs from college, overseas lowest bidders, and mediocre engineers who gave up years ago. All over the course of a ~20 year career.
Not once in all that time has anyone PRed and merged my completely unrelated and unfinished branch into main. Except a few weeks ago. By someone who was using the LLM to make PRs.
He didn't understand when I asked him about it and was baffled as to how it happened.
Really annoying, but I got significantly less concerned about the future of human software engineering after that.
Have you used an LLM specifically trained for tool calling, in Claude Code, Cursor or Aider?
They’re capable of looking up documentation, correcting their errors by compiling and running tests, and when coupled with a linter, hallucinations are a non issue.
I don’t really think it’s possible to dismiss a model that’s been trained with reinforcement learning for both reasoning and tool usage as only doing pattern matching. They’re not at all the same beasts as the old style of LLMs based purely on next token prediction of massive scrapes of web data (with some fine tuning on Q&A pairs and RLHF to pick the best answers).
I'm using Claude code to help me learn Godot game programming.
One interesting thing is that Claude will not tell me if I'm following the wrong path. It will just make the requested change to the best of its ability.
For example a Tower Defence game I'm making I wanted to keep turret position state in an AStarGrid2D. It produced code to do this, but became harder and harder to follow as I went on. It's only after watching more tutorials I figured out I was asking for the wrong thing. (TileMapLayer is a much better choice)
LLMs still suffer from Garbage in Garbage out.
don't use LLMs for Godot game programming.
edit: Major engine changes have occurred after the models were trained, so you will often be given code that refers to nonexistent constants and functions and which is not aware of useful new features.
before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?
after coding I ask it "review the code, do you see any for which there are common libraries implementing it? are there ways to make it more idiomatic?"
you can also ask it "this is an idea on how to solve it that somebody told me, what do you think about it, are there better ways?"
> before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?
Just for the fun of it, and so you lose your "virginity" so to speak, next time when the magic machine gives you the answer about "what it thinks", tell it its wrong in a strict language and scold it for misleading you. Tell it to give you the "real" best practices instead of what it spat out. Then sit back and marvel at the machine saying you were right and that it had mislead you. Producing a completely, somewhat, or slightly different answer (you never know what you get on the slot machine).
Both the before and after are better done manually. What you are describing is fine for the heck of it (I‘ve vibe coded a whisper related rust port today without having any actual rust skills), but I’d never use fully vibed software in production. That’s irresponsible in multiple ways.
Do you also light candles and chant?
Ask a model to
"Write a chess engine where pawns move backward and kings can jump like nights"
It will keep slipping back into real chess rules. It learned chess, it did not understand the concept of "rules"
Or
Ask it to reverse a made up word like
"Reverse the string 'glorbix'"
It will get it wrong on the first try. You would not fail.
Or even better ask it to...
"Use the dxastgraphx library to build a DAG scheduler."
dxastgraphx is a non existing library...
Marvel at the results...tried in both Claude and ChatGPT....
I‘ve just tried the dxastgraphx one in pi with Opus 4.5. This was its response:
just tried to reverse the string you provided using Gemini. it worked fine on the first try
Yeah, it makes me wonder whether I should start learning to be a carpenter or something. Those who either support AI or thinks "it's all bullshit" cite a lack of evidence for humans truly being replaced in the engineering process, but that's just the thing; the unprecedented levels of uncertainty make it very difficult to invest one's self in the present, intellectually and emotionally. With the current state of things, I don't think it's silly to wonder "what's the point" if another 5 years of this trajectory is going to mean not getting hired as a software dev again unless you have a PhD and want to work for an AI company.
What doesn't help is that the current state of AI adoption is heavily top-down. What I mean is the buy-in is coming from the leadership class and the shareholder class, both of whom have the incentive to remove the necessary evil of human beings from their processes. Ironically, these classes are perhaps the least qualified to decide whether generative AI can replace swathes of their workforce without serious unforeseen consequences. To make matters worse, those consequences might be as distal as too many NEETs in the system such that no one can afford to buy their crap anymore; good luck getting anyone focused on making it to the next financial quarter to give a shit about that. And that's really all that matters at the end of the day; what leadership believes, whether or not they are in touch with reality.
Where the hell was all this fear when the push for open source everything got fully underway? When entire websites were being spawned and scaffolded with just a couple lines of code? Do we not remember all those impressive tech demos of developers doing massive complex thing with "just one line of code"? How did we not just write software for every kind of software problem that could exist by now?
How has free code, developed by humans, become more available than ever and yet somehow we have had to employ more and more developers? Why didn't we trend toward less developers?
It just doesn't make sense. AI is nothing but a snippet generator, a static analyzer, a linter, a compiler, an LSP, a google search, a copy paste from stackoverflow, all technologies we've had for a long time, all things developers used to have to go without at some point in history.
I don't have the answers.
In aviation safety, there is a concept of "Swiss cheese" model, where each successful layer of safety may not be 100% perfect, but has a different set of holes, so overlapping layers create a net gain in safety metrics.
One can treat current LLMs as a layer of "cheese" for any software development or deployment pipeline, so the goal of adding them should be an improvement for a measurable metric (code quality, uptime, development cost, successful transactions, etc).
Of course, one has to understand the chosen LLM behaviour for each specific scenario - are they like Swiss cheese (small numbers of large holes) or more like Havarti cheese (large number of small holes), and treat them accordingly.
> One can treat current LLMs as a layer of "cheese" for any software development or deployment pipeline
It's another interesting attempt at normalising the bullshit output by LLMs, but NO. Even if the entshittified Boeing, the aviation industry safety and reliability records, are far far far above deterministic software (know for a lot of un-reliability itself), and deterministic, B2C software to LLMs in turn is what Boeing and Airbus software and hardware reliablity are for the B2C software...So you cannot even begin to apply aviation industry paradigms to the shit machines, please.
Interesting concept, but as of now we don't apply this technologies as a new compounding layer. We are not using them after the fact we constructed the initial solution. We are not ingesting the code to compare against specs. We are not using them to curate and analyze current hand written tests(prompt: is this test any good? assistant: it is hot garbage, you are inferring that expected result equals your mocked result). We are not really at this phase yet. Not in general, not intelligently. But when the "safe and effective" crowd leave technology we will find good use cases for it, I am certain (unlike uml, VB and Delphi)
LLMs are Kraft Singles. Stuff that only kind of looks like cheese. Once you know it's in there, someone has to inspect, and sign-off on, the entire wheel for any credible semblance of safety.
how sure are you that an llm won't be better at reviewing code for safety than most humans, and eventually, most experts?
I nodded furiously at this bit:
> The hard part of computer programming isn't expressing what we want the machine to do in code. The hard part is turning human thinking -- with all its wooliness and ambiguity and contradictions -- into computational thinking that is logically precise and unambiguous, and that can then be expressed formally in the syntax of a programming language.
> That was the hard part when programmers were punching holes in cards. It was the hard part when they were typing COBOL code. It was the hard part when they were bringing Visual Basic GUIs to life (presumably to track the killer's IP address). And it's the hard part when they're prompting language models to predict plausible-looking Python.
> The hard part has always been – and likely will continue to be for many years to come – knowing exactly what to ask for.
I don't agree with this:
> To folks who say this technology isn’t going anywhere, I would remind them of just how expensive these models are to build and what massive losses they’re incurring. Yes, you could carry on using your local instance of some small model distilled from a hyper-scale model trained today. But as the years roll by, you may find not being able to move on from the programming language and library versions it was trained on a tad constraining.
Some of the best Chinese models (which are genuinely competitive with the frontier models from OpenAI / Anthropic / Gemini) claim to have been trained for single-digit millions of dollars. I'm not at all worried that the bubble will burst and new models will stop being trained and the existing ones will lose their utility - I think what we have now is a permanent baseline for what will be available in the future.
maybe not the MOST valuable part of prompting an LLM during a task, but one of them, is defining the exact problem in precise language. i dont just blindly turn to an LLM without understanding the problem first, but i do find Claude is better than a cardboard cutout of a dog
Hardest part of programming is knowing wtf all the existing code does and why.
The first part is surely true if you change it to "the hardEST part," (I'm a huge believer in "Programming as Theory Building"), but there are plenty of other hard or just downright tedious/expensive aspects of software development. I'm still not fully bought in on some of the AI stuff—I haven't had a chance to really apply an agentic flow to anything professional, I pretty much always get errors even when one-shotting, and who knows if even the productive stuff is big-picture economical—but I've already done some professional "mini projects" that just would not have gotten done without an AI. Simple example is I converted a C# UI to Java Swing in less than a day, few thousand lines of code, simple utility but important to my current project for <reasons>. Assuming tasks like these can be done economically over time, I don't see any reason why small and medium difficulty programming tasks can't be achieved efficiently with these tools.
Aren't they also losing money on the marginal inference job?
Indeed, while DeepSeek 3.2 or GLM 4.7 are not Opus 4.5 quality, they are close enough that I could _get by_ because they're not that far off, and are about where I was with Sonnet 3.5 or Sonnet 4 a few months ago.
I'm not convinced DeepSeek is making money hosting these, but it's not that far off from it I suspect. They could triple their prices and still be cheaper than Anthropic is now.
There is a guaranteed cap on how far LLM based AI models can go. Models improve by being trained on better data. LLMs being used to generate millions of lines of sloppy code will substantially dilute the pool of good training data. Developers moving over to AI based development will cease to grow and learn - producing less novel code.
The massive increase in slop code and loss of innovation in code will establish an unavoidable limit on LLMs.
I think most of the progress is training by reinforcement learning on automated assessments of the code produced. So data is not really an issue.
That is a naive assumption. Or rather multiple naive assumptions: Developers mostly don’t move over to AI development, but integrate it into their workflow. Many of them will stay intellectually curious and thus focus their attention elsewhere; I’m not convinced they will just suddenly all stagnate.
Also, training data isn’t just crawled text from the internet anymore, but also sourced from interactions of millions of developers with coding agents, manually provided sample sessions, deliberately generated code, and more—there is a massive amount of money and research involved here, so that’s another bet I wouldn’t be willing to make.
But they're not just training off code and its use, but off a corpus general human knowledge in written form.
I mean, in general not only do they have all of the crappy PHP code in existence in their corpus but they also have Principia Mathematica, or probably The Art of Computer Programming. And it has become increasingly clear to me that the models have bridged the gap between "autocomplete based on code I've seen" to some sort of distillation of first order logic based on them just reading a lot of language... and some fuzzy attempt at reasoning that came out of it.
Plus the agentic tools driving them are increasingly ruthless at wringing out good results.
That said -- I think there is a natural cap on what they can get at as pure coding machines. They're pretty much there IMHO. The results are usually -- I get what I asked for, almost 100%, and it tends to "just do the right thing."
I think the next step is actually to actually make it scale and make it profitable but also...
fix the tools -- they're not what I want as an engineer. They try to take over, and they don't put me in control, and they create a very difficult review and maintenance problem. Not because they make bad code but because they make code that nobody feels responsible for.