There are definitely tasks you can prompt an AI in 5 minutes that would take a whole day to do. One example is adding something to a CI pipeline and getting it to green (i.e. maybe you're adding your first ever e2e test), especially when your CI pipeline is painfully slow. e.g. if your pipeline takes 30 minutes to finish, and it takes around 10 tries to figure out all the random problems, that was easily a full day task before AI. Now I prompt AI to figure it out, which takes 5 minutes of active attention, and it figures it out for the rest of the day while I do other stuff.
There are definitely some tasks that AI has made 10x or 100x faster, but not the tasks that make up my day to day.
For me, there may be one thing I do every few months that AI is really good at.
The overwhelming majority of the work I do, LLM tooling is just ok at. Definitely faster overall, but with lots of human planning, hand holding and course correction.
I would estimate LLMs make me, on average 50% more productive , which is huge! But from my experience I cannot believe anyone is experiencing a 8h/5m multiple productivity boost overall
I mean I wasn’t sitting around unproductively waiting for 30 minute CI runs to finish before LLMs came along, either.
I also like to use LLMs for background work on iterative tasks, but the way some people talk about work in the days before LLMs make me realize how we’re arriving at these claims that LLMs make us 10X more productive. If it took someone all day to do a few minutes of active work then I could see how LLMs would feel like a 10X or 50X productivity unlocker simply by not shutting down and doing nothing at the first sign of a pause.
Count yourself as one of the lucky few that can pay a 0 minute context switching price to switch between whatever other productive work you were doing and debugging CI. Most people I speak to remark that continually switching between unrelated tasks significantly diminishes their productivity.
The example above was talking about 30 minute wait times between being able to do work.
Nobody is staring at the screen for 30 minutes in deep concentration while they wait for that turn to complete. They are context switching to something, but maybe it’s Hacker News or Reddit.
There is always a context switch in scenarios like this.
Fundamentally it cannot be much better than how well we can write the spec and then validate the results.
It’s always gonna be a multi shot process. And it can already write code good enough. That’s no longer the bottleneck.
Further, Qwen 27b is such an incredible masterpiece for coding and it can run on consumer hardware today. Anthropic/OpenAI are gonna give up on coding models very soon. There’s not gonna be any money in it when you can run your own local model for significantly cheaper.
Qwen27b is not SOTA but the value is insane. You can basically use it for small tasks and then route harder problems to opus or sonnet and boom you’ve said a lot of money.
Not my experience. AI takes a lot less time doing tasks than myself. My current issue is that 2 out of 3 they don't produce the code that I want, so I either have to reprompt or do it myself. And the solution is simple: just accept their way; I'm just not there yet.
In any case, on that one time that AI works perfectly, it saves me hours of coding. So the potential is there...
The delta isn't a day to 5 minutes, but a day to a half hour (where most of my larger tickets take)? Yes, especially as you don't need to watch it do its thing anymore.
To me, the lack of amazing productivity gains is that we have done nothing to speed up figuring out what to build and nothing to speed up getting code into production from pull request and in a lot of companies, code review is already saturated.
Also, the agents are good at figuring out problems for themselves, so I can ask it to set up a CI/CD pipeline, give it GitHub access, and it will just try things until it succeeds.
More "bad news" and from the man who helped create and then promote Agile to dilute the value of software developers by forcing software development out of the control freak's nightmare where it started: seemingly esoteric, non-understandable by management, and make sure the next generation of developers knows their place. That's Agile's insidious purpose as far I am concerned.
As for AI-written code, I wouldn't fly on a plane controlled by AI-designed and AI-tested code, but much of development is busy work, not problem solving or design. AI excels at turning a protocol spec into a parser for example. I'll take that any day. AI excels at finding stuff, particularly non-code, thesis-level ideas for algorithms and also at about the same level, what's been shown not to work when solving a non-deterministic problem.
If we're lucky, AI will fill in after exposing who is only doing busy work and who is creating.
i had to laugh at his announcement that "otoh ai will give you the power to get all that coverage and cyclomatic complexity stats done in minutes, which you know doesn't really mean that the code is going to work".
also, his prediction assumes that ai will be able to learn from its own code going forward. will it also create its new programming languages and tools?
There are probably some respectable workflows that involve an LLM writing most of the code, but AI is still terrible at understanding some critical parts of the problem. You still have to tell it what to write and how it should work or there are high odds that you'll get a hot mess. And there still needs to be a human that understands everything there and how to debug it. For me, the most enjoyable path there is to write it myself, because I would rather be involved in writing the code than only involved in reading it. It might not be the fastest path there, but it gets the job done for the foreseeable future. I could end up like the Amish who choose not to use technology that was developed after a certain point, from what I can tell they do alright.
> but AI is still terrible at understanding some critical parts of the problem
I agree to some extent with regards to writing new code. One piece where I have been perpetually impressed is at asking it to put together a plausible explanation of how something weird has happened. I have been blown away, multiple times, by Codex and Claude’s ability to take a prompt like “When I did X, I expected Y to happen but instead observed Z. Put together an explanation for how that could happen, including the individual lines of code that can lead to ending up in that state.”
In one notable case, it traced through a pretty complex sensor fusion -> computational geometry problem and identified a particular calculation far upstream that could go negative in certain circumstances, which would lead to a function far downstream generating a polygon with incorrect winding order (clockwise instead of CCW).
In another, it identified a variable that was being initialized to 0 instead of initialized to (a specific runtime value that it should’ve been initialized to during a state transition). The downstream effect, minutes later, would be pathological behaviour that would happen exactly once per boot.
In both cases I was provided with a specific causal chain of events with individual source files and line numbers so that I could verify the plausibility of the explanation myself.
That is how I use it too, for explanations and suggestions when I run into something unexpected. It is incredible in the back seat.
I don't mean to completely dismiss their utility. I realized recently that I was having more fun coding than I ever remember. It is a strange feeling to go along with vibe out there that software developers are becoming obsolete.
I don't have a lot of patience for Bob. That being said I have to agree with him on test coverage (that's as far as I made it through his monologue). IMHO, that is something that I 100% am okay letting the LLM tooling write and manage. I used to argue about whether or not we needed a test that verified that the value of a constant didn't change, and if 100% coverage was really that important. Now I don't care, I just let Claude write the test and keep it up-to-date.
Kind of a great video! I enjoyed it. His point about testing coverage and generating mutations to ensure the tests fail resonated. I get concerned sometimes that the AI is writing tests not to ensure the logic is correct, but to ensure the tests pass against the code it already wrote. Any other ideas on this? Is there a code review step or CI checkpoint that would decrease the likelihood of that?
To be fair the overwhelming majority of tests I've seen in the wild written by humans have been the same. Not a lot of good material for AI to learn from.
It’s hard to give up, but likely necessary. That doesn’t mean quality has to suffer, we can still gate with deterministic quality tooling where it matters. But yeah, at some scale it stops mattering how human readable the code is, as long as AI can effectively and efficiently (token-wise) make edits or add features.
The point is not human readability, but good structure. Spaghetti code is as bad for an LLM as for a human, because structural complexity and the amount of coupling are fundamental limits, not human-specific.
Here's how history rhymes with this logic. The development of compilers v writing assembly language was not without a very similar "controversy" — that is, are the new tools more efficient or less efficient.
The first compilers were measured relative to hand-tuned assembly language efficiency. The existing world of compute was very much "compute bound" and inefficient code was being chased out of every system.
The introduction of the first compilers generally delivered code "within 10-30%" as efficient as standard professional assembly. This "benchmark" was enough for almost a generation of Fortran programmers to dismiss the capabilities of compilers.
Also worth noting, early compilers (all through the 1980s) routinely had bugs that generated incorrect code. Debugging a compiler is a nightmare (personal experience). This only provided more "ammo."
With the arrival of COBOL the debate started to shift. COBOL generated decidedly "bloated" code so there was no way to win the efficiency argument. But what people started to realize was that a "modern" programming language made it possible to deliver vastly more software and for many more people to work on the same code (ASM notorious for being challenging for multiple engineers on the same portion of code). So the metric slowly started to move from "as good as hand tuned assembler" to "able to write bigger, more sophisticated code in less time with more people). Computers gained timesharing, more memory, and faster CPUs which made the efficiency argument far less compelling (only to repeat with the first 8K or 64K PCs).
This entire transition is capped off with a description in Fred Brooks "Mythical Man Month" book, one of the seminal books in the field of programming and standard issue book sitting in my office waiting for me on my first day at Microsoft. (See full book free here https://web.eecs.umich.edu/~weimerw/2018-481/readings/mythic...)
It is very early. I was not a programmer when the above happened though I did join the professional ranks while many still held these beliefs. For example, I interned writing COBOL on mainframes while PCs were using C and Pascal which were buggy and viewed as inefficient on processor/space-constrained PCs.
The debate would continue with C++, garbage collection, interpreted v compiled (Visual Basic) and more. As a fairly consistent observation over decades, every new tool is viewed through a lens (at first) by experienced programmers over what is worse while new programmers use the tool and operate in a new context (eg "more software" or "bigger projects"). The excerpt below shows this debate as captured in 1972.
"Forty years later, in September of 2018, I started working on this version of Space War. It's an animated GUI driven system with a frame rate of 30fps. It is written entirely in Clojure and uses the Quil shim for the Processing GUI framework." - Robert Martin
He helped enshitify the industry - empowering midlings to cry about "clean code" instead of actually learning to produce a great product. No thanks, Bob.
I'm not sure I agree, but I can see it used as an open-ended interview question. "Is English the new programming language?" It would be a good test if someone gravitates towards pedantry (AIs can speak another common language just as well!) or if they actually get into the difference between prompting and programming, or whether it's at its core an LLM or just an AI based on transformer architecture. Extra points for having it be part of an async interview and interviewees using an LLM to write the answer, and interviewers using LLMs to grade them.
For all LLM flaws, if it kills the whole Agile/SCRUM/whatever grift, it will have been worth it. The damage these guys have done to software industry at large is unfathomable.
Hot take, but the bureaucracy of Scrum, the Figmafication of design and the disdain of PMs for iterative deliveries generates more work and waste than AIs are able to save.
This. Uncle Bob was already over, and now he seems to be hitting the skids REAL bad. Just listening to him is tough: this guy's bad news, I didn't realize he was this bad off.
Wealthy white dude edging towards senility taking a liking to bathrobe social media shorts. Take a guess. It's going to involve a political party and a lot of weird public takes unrelated to software.
Doesn't really seem fair, I'm gonna be a old white man some day, ain't really that much I can do about it...(Well, I suppose sex changes are a thing now, but really?)
Are you going to be wealthy, with your head buried in the firehose of an algorithmic feed? Those are things you can do something about.
Alternatively, you could take a crack at deconstructing whiteness. Depending how young you currently are, you might be able to make a dent by the time you're an old man. That's trickier though, because it involves serious social reform. Or if sociology isn't your deal, maybe you could become a biologist, and cure old age?
Even just purely on a professional level, he’s clean code architecture was very bad advice, which was marketed and hyped up to something it never deserved. The software industry should have cancelled Uncle Bob like archeologists cancelled Graham Hancock purely for his professional opinions (though I am not against cancelling him for his political opinions either; we can do both).
The craziest thing about AI is you can just try it yourself and check if the claims are true.
I use Claude code and codex daily. They have become an integral part of my workflow.
There is no task that takes me a day that they can complete in five minutes.
Even with the lightning fast progress being made, it looks like LLMs are a decade or more away from being that good.
If AI can do your job for you, you should be the first to know. Just try it and see!
There are definitely tasks you can prompt an AI in 5 minutes that would take a whole day to do. One example is adding something to a CI pipeline and getting it to green (i.e. maybe you're adding your first ever e2e test), especially when your CI pipeline is painfully slow. e.g. if your pipeline takes 30 minutes to finish, and it takes around 10 tries to figure out all the random problems, that was easily a full day task before AI. Now I prompt AI to figure it out, which takes 5 minutes of active attention, and it figures it out for the rest of the day while I do other stuff.
There are definitely some tasks that AI has made 10x or 100x faster, but not the tasks that make up my day to day.
For me, there may be one thing I do every few months that AI is really good at.
The overwhelming majority of the work I do, LLM tooling is just ok at. Definitely faster overall, but with lots of human planning, hand holding and course correction.
I would estimate LLMs make me, on average 50% more productive , which is huge! But from my experience I cannot believe anyone is experiencing a 8h/5m multiple productivity boost overall
I mean I wasn’t sitting around unproductively waiting for 30 minute CI runs to finish before LLMs came along, either.
I also like to use LLMs for background work on iterative tasks, but the way some people talk about work in the days before LLMs make me realize how we’re arriving at these claims that LLMs make us 10X more productive. If it took someone all day to do a few minutes of active work then I could see how LLMs would feel like a 10X or 50X productivity unlocker simply by not shutting down and doing nothing at the first sign of a pause.
Count yourself as one of the lucky few that can pay a 0 minute context switching price to switch between whatever other productive work you were doing and debugging CI. Most people I speak to remark that continually switching between unrelated tasks significantly diminishes their productivity.
The example above was talking about 30 minute wait times between being able to do work.
Nobody is staring at the screen for 30 minutes in deep concentration while they wait for that turn to complete. They are context switching to something, but maybe it’s Hacker News or Reddit.
There is always a context switch in scenarios like this.
Fundamentally it cannot be much better than how well we can write the spec and then validate the results.
It’s always gonna be a multi shot process. And it can already write code good enough. That’s no longer the bottleneck.
Further, Qwen 27b is such an incredible masterpiece for coding and it can run on consumer hardware today. Anthropic/OpenAI are gonna give up on coding models very soon. There’s not gonna be any money in it when you can run your own local model for significantly cheaper.
Qwen27b is not SOTA but the value is insane. You can basically use it for small tasks and then route harder problems to opus or sonnet and boom you’ve said a lot of money.
Not my experience. AI takes a lot less time doing tasks than myself. My current issue is that 2 out of 3 they don't produce the code that I want, so I either have to reprompt or do it myself. And the solution is simple: just accept their way; I'm just not there yet.
In any case, on that one time that AI works perfectly, it saves me hours of coding. So the potential is there...
Super trivial to hand verify 350kloc changes for sure.
Quis custodiet ipsos custodes?
Yep. It depends so much on task, expectations, ability to express what you want and whether the problem has been solved elsewhere or not.
The results are always so ridiculously different.
> The results are always so ridiculously different.
Well... yes! It's not the same as running a program through a compiler 100k times and getting the same binary, it's... different: https://www.lelanthran.com/chap15/content.html
The delta isn't a day to 5 minutes, but a day to a half hour (where most of my larger tickets take)? Yes, especially as you don't need to watch it do its thing anymore.
To me, the lack of amazing productivity gains is that we have done nothing to speed up figuring out what to build and nothing to speed up getting code into production from pull request and in a lot of companies, code review is already saturated.
Also, the agents are good at figuring out problems for themselves, so I can ask it to set up a CI/CD pipeline, give it GitHub access, and it will just try things until it succeeds.
> There is no task that takes me a day that they can complete in five minutes.
Five minutes is pushing it, but 15 minutes? Absolutely.
More "bad news" and from the man who helped create and then promote Agile to dilute the value of software developers by forcing software development out of the control freak's nightmare where it started: seemingly esoteric, non-understandable by management, and make sure the next generation of developers knows their place. That's Agile's insidious purpose as far I am concerned.
As for AI-written code, I wouldn't fly on a plane controlled by AI-designed and AI-tested code, but much of development is busy work, not problem solving or design. AI excels at turning a protocol spec into a parser for example. I'll take that any day. AI excels at finding stuff, particularly non-code, thesis-level ideas for algorithms and also at about the same level, what's been shown not to work when solving a non-deterministic problem.
If we're lucky, AI will fill in after exposing who is only doing busy work and who is creating.
i had to laugh at his announcement that "otoh ai will give you the power to get all that coverage and cyclomatic complexity stats done in minutes, which you know doesn't really mean that the code is going to work".
also, his prediction assumes that ai will be able to learn from its own code going forward. will it also create its new programming languages and tools?
but it's a funny rant.
That's a conspiracy theory if I ever heard one.
There are probably some respectable workflows that involve an LLM writing most of the code, but AI is still terrible at understanding some critical parts of the problem. You still have to tell it what to write and how it should work or there are high odds that you'll get a hot mess. And there still needs to be a human that understands everything there and how to debug it. For me, the most enjoyable path there is to write it myself, because I would rather be involved in writing the code than only involved in reading it. It might not be the fastest path there, but it gets the job done for the foreseeable future. I could end up like the Amish who choose not to use technology that was developed after a certain point, from what I can tell they do alright.
> but AI is still terrible at understanding some critical parts of the problem
I agree to some extent with regards to writing new code. One piece where I have been perpetually impressed is at asking it to put together a plausible explanation of how something weird has happened. I have been blown away, multiple times, by Codex and Claude’s ability to take a prompt like “When I did X, I expected Y to happen but instead observed Z. Put together an explanation for how that could happen, including the individual lines of code that can lead to ending up in that state.”
In one notable case, it traced through a pretty complex sensor fusion -> computational geometry problem and identified a particular calculation far upstream that could go negative in certain circumstances, which would lead to a function far downstream generating a polygon with incorrect winding order (clockwise instead of CCW).
In another, it identified a variable that was being initialized to 0 instead of initialized to (a specific runtime value that it should’ve been initialized to during a state transition). The downstream effect, minutes later, would be pathological behaviour that would happen exactly once per boot.
In both cases I was provided with a specific causal chain of events with individual source files and line numbers so that I could verify the plausibility of the explanation myself.
That is how I use it too, for explanations and suggestions when I run into something unexpected. It is incredible in the back seat.
I don't mean to completely dismiss their utility. I realized recently that I was having more fun coding than I ever remember. It is a strange feeling to go along with vibe out there that software developers are becoming obsolete.
I don't have a lot of patience for Bob. That being said I have to agree with him on test coverage (that's as far as I made it through his monologue). IMHO, that is something that I 100% am okay letting the LLM tooling write and manage. I used to argue about whether or not we needed a test that verified that the value of a constant didn't change, and if 100% coverage was really that important. Now I don't care, I just let Claude write the test and keep it up-to-date.
Kind of a great video! I enjoyed it. His point about testing coverage and generating mutations to ensure the tests fail resonated. I get concerned sometimes that the AI is writing tests not to ensure the logic is correct, but to ensure the tests pass against the code it already wrote. Any other ideas on this? Is there a code review step or CI checkpoint that would decrease the likelihood of that?
To be fair the overwhelming majority of tests I've seen in the wild written by humans have been the same. Not a lot of good material for AI to learn from.
It’s hard to give up, but likely necessary. That doesn’t mean quality has to suffer, we can still gate with deterministic quality tooling where it matters. But yeah, at some scale it stops mattering how human readable the code is, as long as AI can effectively and efficiently (token-wise) make edits or add features.
The point is not human readability, but good structure. Spaghetti code is as bad for an LLM as for a human, because structural complexity and the amount of coupling are fundamental limits, not human-specific.
Amazing tweet.
https://x.com/stevesi/status/2050325415793951124
Here's how history rhymes with this logic. The development of compilers v writing assembly language was not without a very similar "controversy" — that is, are the new tools more efficient or less efficient.
The first compilers were measured relative to hand-tuned assembly language efficiency. The existing world of compute was very much "compute bound" and inefficient code was being chased out of every system.
The introduction of the first compilers generally delivered code "within 10-30%" as efficient as standard professional assembly. This "benchmark" was enough for almost a generation of Fortran programmers to dismiss the capabilities of compilers.
Also worth noting, early compilers (all through the 1980s) routinely had bugs that generated incorrect code. Debugging a compiler is a nightmare (personal experience). This only provided more "ammo."
With the arrival of COBOL the debate started to shift. COBOL generated decidedly "bloated" code so there was no way to win the efficiency argument. But what people started to realize was that a "modern" programming language made it possible to deliver vastly more software and for many more people to work on the same code (ASM notorious for being challenging for multiple engineers on the same portion of code). So the metric slowly started to move from "as good as hand tuned assembler" to "able to write bigger, more sophisticated code in less time with more people). Computers gained timesharing, more memory, and faster CPUs which made the efficiency argument far less compelling (only to repeat with the first 8K or 64K PCs).
This entire transition is capped off with a description in Fred Brooks "Mythical Man Month" book, one of the seminal books in the field of programming and standard issue book sitting in my office waiting for me on my first day at Microsoft. (See full book free here https://web.eecs.umich.edu/~weimerw/2018-481/readings/mythic...)
It is very early. I was not a programmer when the above happened though I did join the professional ranks while many still held these beliefs. For example, I interned writing COBOL on mainframes while PCs were using C and Pascal which were buggy and viewed as inefficient on processor/space-constrained PCs.
The debate would continue with C++, garbage collection, interpreted v compiled (Visual Basic) and more. As a fairly consistent observation over decades, every new tool is viewed through a lens (at first) by experienced programmers over what is worse while new programmers use the tool and operate in a new context (eg "more software" or "bigger projects"). The excerpt below shows this debate as captured in 1972.
> Also worth noting, early compilers (all through the 1980s) routinely had bugs that generated incorrect code.
Incorrect. They had bugs that generated incorrect code. They didn't routinely have bugs that generated incorrect code :-/
And the bugs they had were reproducible.
That’s where the tooling comes in!
Gives me a whole new perspective to the phrase clean code.
"Forty years later, in September of 2018, I started working on this version of Space War. It's an animated GUI driven system with a frame rate of 30fps. It is written entirely in Clojure and uses the Quil shim for the Processing GUI framework." - Robert Martin
https://blog.cleancoder.com/uncle-bob/2021/11/28/Spacewar.ht...
That gotta be a joke right? It's like running agents to write agent ochestrators to write orchestrators for orchestrators just for clean code
That was a bizarre performance.
spent a 30-year lustrous SWE career avoiding reading and listening anything this dude says, probably among the smartest things I’ve ever done
He helped enshitify the industry - empowering midlings to cry about "clean code" instead of actually learning to produce a great product. No thanks, Bob.
English is the new programming language.
I'm not sure I agree, but I can see it used as an open-ended interview question. "Is English the new programming language?" It would be a good test if someone gravitates towards pedantry (AIs can speak another common language just as well!) or if they actually get into the difference between prompting and programming, or whether it's at its core an LLM or just an AI based on transformer architecture. Extra points for having it be part of an async interview and interviewees using an LLM to write the answer, and interviewers using LLMs to grade them.
That's just, like, his opinion man
He's an idol, didn't you know? Much like his software architecture takes, they'll be taken as gospel.
His opinions were never really good to begin with, he was just excellent at marketing them as good opinions.
It comes as no surprise to me that the guy who has bad opinions about software architecture, has worse opinions about vibe coding.
Personally I have never been a fan of clean code architecture…to each their own I guess
I'm an AI skeptic, but I do think that _he_ will be out-coded by AI, no problem.
For all LLM flaws, if it kills the whole Agile/SCRUM/whatever grift, it will have been worth it. The damage these guys have done to software industry at large is unfathomable.
Hot take, but the bureaucracy of Scrum, the Figmafication of design and the disdain of PMs for iterative deliveries generates more work and waste than AIs are able to save.
I tend to agree with his point.
But I found myself laughing at the style; just ranting about software like a cartoon villain in his bathrobe. No fucks given.
"It is unavoidable. It is your destiny. You, like your father, are now mine."
Uncle Bob full of shit? Colour me purple!
I fully believe AI can write better code faster than Robert C. Martin.
Clean Architecture and Uncle Bob can take a hike.
This. Uncle Bob was already over, and now he seems to be hitting the skids REAL bad. Just listening to him is tough: this guy's bad news, I didn't realize he was this bad off.
I thought this was about Uncle Bob being “canceled.”
Which is long overdue.
What did he do?
He became Lord Voldemort. No one knows exactly what he did, but you don't dare even whisper his name.
Wealthy white dude edging towards senility taking a liking to bathrobe social media shorts. Take a guess. It's going to involve a political party and a lot of weird public takes unrelated to software.
White men are not allowed to grow old? How come?
Doesn't really seem fair, I'm gonna be a old white man some day, ain't really that much I can do about it...(Well, I suppose sex changes are a thing now, but really?)
Of course they are. I'm only stating a trend so people can infer.
Are you going to be wealthy, with your head buried in the firehose of an algorithmic feed? Those are things you can do something about.
Alternatively, you could take a crack at deconstructing whiteness. Depending how young you currently are, you might be able to make a dent by the time you're an old man. That's trickier though, because it involves serious social reform. Or if sociology isn't your deal, maybe you could become a biologist, and cure old age?
shalom
Even just purely on a professional level, he’s clean code architecture was very bad advice, which was marketed and hyped up to something it never deserved. The software industry should have cancelled Uncle Bob like archeologists cancelled Graham Hancock purely for his professional opinions (though I am not against cancelling him for his political opinions either; we can do both).