I haven’t gained much productivity by using any of the LLMs, particularly not the recent ones as their thinking ability is nothing else than overthinking and overcomplicating everything.
What I really use LLMs for is to uncover what I should not do, and this is quite a strong win.
Overall, they are generally very bad at giving me anything useful for the things I build.
For boilerplate stuff like generating tests against a well-defined API, in a compiled language, maybe 2-3x. Far less in languages and frameworks like Ruby/Rails where the distance between generating the code and figuring out if it’s even valid or not is large.
Mechanical refactors that are hard to express via e.g. regex but easy in natural language: maybe 5x.
HTML and CSS, where I know exactly what I want and can clearly articulate it: 2-5x.
For anything architecture-y, off the beaten path, or where generating a substantial amount of code is required: near 0%. Often in the negatives.
At home pissing about- 5x faster. At work, I think actively slows me down. At first I thought the difference was that at work I needed something very precise, and I wasn’t good enough at prompting to deliver that precision. Now I think it’s that if I try to do a hobby project and the AI steamrolls over any prompting to do a slightly different worse hobby project, I’m way less likely to notice or resist because there’s no iron requirement of profit maintaining discipline.
There has been very little change for me.
The one area where AI has been of use is for providing debugging suggestions in those circumstances when I am so short of ideas that I am contemplating posting a question on Stack Overflow. Only once has Chat GPT actually found a bug directly, but on a handful of occasions it has made suggestions which have yielded results when investigated further in traditional way (e.g. reading documentation).
Still, At least the AI actually does something rather than ignore me or scold me for the way I have posed my question.
- LLMs are absolutely abysmal at PyTorch. They can basic MLP workflows, but that's it more or less. 0% efficiency gained.
- LLMs are great at short autocompletes, especially when the code is predictable. The typing itself is very efficient. Using vim-like shortcuts is now the slower way to write code.
- LLMs are great at writing snippets for tech I am not using that often. Formatting dates, authorizing GDrive, writing advanced regex, etc. I could do it manually, but I would have to check docs, now I can have it done in seconds.
- LLMs are great at writing boilerplate code, e.g. setting up argparse, printing the results in tables, etc. I think I am saving hours per month on these.
- Nowadays I often let LLMs build custom HTML visualization/annotation tools. This is something I would never do before due to time constraints, and the utility is crazy good. It allows my team to better understand the data we are working with.
10x when working on a code base I'm very familiar with.
Basically, it amounts to being able to give detailed instructions to a junior dev (who can type incredibly fast) and having them carry out your instructions.
If you don't know the code base, and thus can't provide detailed instructions, this junior dev can (using their incredible typing speed) quickly run off the rails. In this case, as you don't know the code base, you wouldn't know it's off the rails. So you're S.O.L.
LLms are both faster, smarter and way dumber than a junior at the same time.
They work faster, but more often make wrong assumptions without asking. Llms dont ask the stupid questions a junior might, but those questions are essential to getting it right.
It really helps where the code I'm writing fits the broad description of boilerplate.
Need to integrate Stripe with the Clerk API in my Astro project? Claude's all over that. 300% faster. I think of it like, if there was a package that did exactly what I wanted, I'd use that package. There just happens not to be; but Claude excels at package-like code.
But as soon as I need to write any unique code – the code that makes my app my app – I find it's perhaps a touch faster in the moment, but the long-term result isn't faster.
Because now I don't understand my code, right? How could I. I didn't write it. So as soon as something goes wrong, or I want to add a feature, either I exacerbate this problem by getting Claude to do it, or I have to finally put in the work that I should have put in the first time.
Or I have to spend about the same amount of time creating a CLAUDE.md that I would have if I'd just figured out the code myself. Except now the thing I learned is how to tell a machine how to do something that I actually enjoy doing myself. So I never learn; on the contrary, I feel dumber. Which seems a bit weird.
And if I choose the lazy option here and keep deferring my knowledge to Claude, now I'm charging customers for a thing that I 'vibe coded'. And frankly if you're doing that I don't know how you sleep at night.
This. LLMs are good at stuff that is very general (is often in the dataset). What i gain most from LLM is when i use it to teach me - like extended documentation.
But to make unique solutions you will get pretty random results and worse you are not building understanding and domain knowledge of your program.
Claude Code sounds cool until it makes 3 changes at once 2 of which you are unsure if they are required or if they wont't break something else. I like it for scripts, data transformations and self contained small programs where i can easily verify correctness.
> What i gain most from LLM is when i use it to teach me - like extended documentation.
This, yes. What I do now is use Claude but expressly tell it do not edit my code, just show me, I want to learn. I'm not a very experienced dev so often it'll show me a pattern that I'm unfamiliar with.
I'll use that new knowledge, but then go and type out the code myself. This is slower, in the moment. But I am convinced that the long-term results are better (for me).
I don’t know, it still feels like a productivity lottery to me. The simpler the task, the higher the odds of a big productivity gain, and it can be probably an order of magnitude quicker, especially if the output is very verbose, like frontend tends to be.
But I still haven’t dialed exactly what is too complicated for the LLM to handle (and that goalpost seems to still be moving, but slower now). Because it is almost always very close, I often end up trying to fix the prompt a few times before giving up and just doing it from scratch myself. I think in total the productivity gain for me is probably a lot less than 100%, but more than 0%.
Very hard to estimate, depending on the domain, I'd say 1.5-2x as much.
When it comes to programming in languages and frameworks I'm familiar with, there is virtually no increase in terms of speed (I may use it for double checks), however, it may still help me discover concepts I didn't know.
When it comes to areas I'm not familiar with:
- most of the time, the increase is substantial, for example when I need a targeted knowledge (e.g. finding few APIs in giant libraries), or when I need to understand an existing solution
- in some cases, I waste a lot of time, when the LLM hallucinates a solution that doesn't make sense
- in some other cases, I do jobs that otherwise I wouldn't have done at all
I stress two aspects:
1. it's crucial IMO to treat LLMs as a learning tool before a productivity one, that is, to still learn from its output, rather than just call it a day once "it works"
2. days of later fixing can save hours of upfront checking. or the reverse, whatever one prefers :)
It’s immeasurable. I use AI for powering through personal projects, which would not have gotten done without AI because I also have a job and a life. It allows me to focus on the product and requirements rather than the code. It’s hard to measure because the projects would simply not have gotten done without it.
Each situation has a different bottleneck, and it's almost never how fast you can write lines of code.
You would need: All engineers aligned on AI use. Invested in automating your: unit tests, integration tests, end to end tests / code quality controls / documentation quality controls / generated api docs / security scans / deployments / feature environments / well designed internal libraries / feature flags / reviews / everything infrastructure / have standards for all of these, so on and so forth. You need to lose your culture of meetings to fix miscommunication. You need to centralize product planning and communication. Stop having 100 different tools at the same time (Jira, Email, Confluence, Slack, Teams, GitHub, BitBucket, GitLab, Sharepoint, ...) where you keep snapshots of what you wanted to do, at some point in time. You need to have a high trust culture. You need to understand mistakes will happen more frequently. You probably don't have production incidents often, because you deploy once a month. You will go fast and the faster you go, even with a low failure rate mistakes happen more often, and you'll need to be prepared for that too. Unfortunately most organizations are missing 99% of the above, organizations like to have layers of communication scattered in all kinds of tools, because hey X tool fixes my problem, they need 2 hour meetings so everyone is aligned on where the button goes and whether the button has to be green or blue, and 10 engineers need to be present in the room too, so 20 engineering hours. Then they go to production once a month.
So if you have solved all that then the bottleneck becomes lines of code per minute, and you could rebuild most products in a few days.
For me personally, it ranges from 10x to 1x. On my own projects, and on projects where the development experience is really really great, easily 10x. We would never have brought that much live in these short timespans without AI assisted software development. In large businesses where 20 people need to stare at a Jira board to decide on the most basic things and give feedback through Confluence comments and emails.... Yeah the bottleneck is not how fast you can write lines of code.
Two month ago, i'd say 5%. With Opus4.5, Gemini 3 and gpt 5.2, it's now 20%, maybe 50% if we only talk about personal code output.
If we talk about my whole team output, I'd say the impact on code production is like 80-100%, but the impact in velocity is between 10% and -25%. So many bug in production, security holes, so many poor models definition making it to production DB only for me and the other true senior to have to fix it.
We are seniors, we read your PRs and try our best to do it thoroughly, but with AI multiplying your code output, and writing quite convincing solutions, it's way harder, so please: if an AI have written the code in your PR, verify the tests are not superficial, verify everything works, think for yourself about what the model is used for and if it can be improved before release. Re-verify the tests (especially if the AI had issues writing/passing them). And do it once more. Please (hopefully one of my coworkers will read this).
How would we describe the productivity gains brought on by engaging with tasks we did not have the ability or confidence to prior? The denominator looks like a zero in many places for me.
Yes especially for after hours fun projects where I might have a few hours a week to work on them. Previously I would get so bogged down by boilerplate I had kind of given up on trying. Now with AI tools I can spend more time thinking about the most important problems and send off agents to do the gruntwork.
It is more useful to know what exactly you're writing beyond the description of "business logic and real world problem".
For example, writing UI components maybe easier on some language than writing abstract algorithms, writing standardized solutions (i.e. text-book algorithms, features, etc) is easier than writing customized algorithms, etc.
Also, writing code can be very fast if you don't unit test it, especially for CURD apps. In fact most of my coding time was spent on writing unit tests.
I really hoped AI could write those tests for me to lock the specs of the design completely down. But currently it's the opposite, I have to write test for AI generated code. So, my over all experience can be described as the tyranny of reading other people's code times 10.
I'd say on average about 50% faster but it really depends on the task at hand. On problems that can be isolated pretty well like a new feature that is relatively isolated (for example building a file export in a specific format) it's easily a 10x speed up.
One thing that generally gets less talked about is exploration of the solution space during manual implementation. I work in a very small company and we build a custom ERP solution. Our development process is very stripped down (a good thing IMO). Often times when we get new requirements we brain storm and make a rough design. Then I try implement it and during that phase new questions and edge cases arise, and at any time this happens we adjust the design. In my opinion this is very productive as the details of the design are worked out when I already know the related code very well as I already got down to implementing. This leads to a better fitting design and implementation. Unfortunately this exploration workflow is incompatible with llms if you use them to do the implementation for you. Which means that you have put more effort in the design up front. From my experience that means the gain in speed in such task is nullified and also results in code that fits worse into the rest of the codebase.
It sometimes feels like 10x or more when generating code, however all this code has to be reviewed, which is currently still an 1x task and takes about 90% of the time nowadays. Factoring in the code review I would estimate an overall 2x
For me it feels like roughly a 10–20x change, but mostly because I restructured how I work rather than just adding an “AI helper” on top.
In the last year I’ve shipped a couple of small OSS tools that I almost certainly would not have finished without AI‑assisted “vibe coding”. Everything I build now flows through AI, but in a slightly different way than just chatting with an LLM. I rarely use standalone ChatGPT/Gemini/Claude; almost all of it happens inside GitHub with Copilot and agents wired into my repos.
The big shift was treating GitHub as the interface for almost all of my work, not just code. I have repos for things like hiring, application review, financial reviews, and other operational workflows. There are “master” repos with folders and sub‑folders, and each folder has specific context plus instructions that the AI agent should follow when operating in that scope, essentially global rules and sub‑rules for each area of work.
Because of that structure, AI speeds up more than just the typing of code. Idea → spec → implementation → iteration all compress into much tighter loops. Tasks that would have taken me weeks end‑to‑end are now usually a couple of days, sometimes hours. Subjectively that’s where the 10–20x feeling comes from, even though it’s hard to measure precisely.
On the team side we’ve largely replaced traditional stand‑ups with AI‑mediated updates. KPIs and goals live in these repos, and progress/achievements are logged and summarized via AI, which makes updates more quantitative and easier to search back through. It’s starting to feel less like “AI helps me with code” and more like “AI is the main operating system for how our team works.”
Happy to share more about the repo/folder structure or what has/hasn’t worked if anyone’s curious.
While the multiplier is less for me ( perhaps 3x or 4x ) I think the assumption that productivity gain leads to directly to more money is bit optimistic. Unless you are self employed, or run your own company, and actually get paid by results, being more efficient is seldom paid much. ( with luck you get promotion every two years or so or pay raise every year )
I have worked for too long in the field, but this year and simply thanks to the LLMs I have actually managed to get 4 usable hobby projects done ( as far as I need them to go anyway - personal tools that I use and publish but do not actively maintain unless I need some new feature ), and I have been quite productive with stack I do not normally use at our startup. Most previous years I have finished 0-1 hobby projects.
The ramp up period for new stack was much less, and while I still write some code myself, most of it at least starts as LLM output which I review and adjust to what I really want. It is bit less intellectually satisfying but a lot more efficient way to work for me. And ultimately for work at least I care more about good enough results.
Completely unrelated, but what’s with the spaces inside the parenthesis here? It’s super weird (and leads to incorrect text layout with a standalone parenthesis at the end or beginning of a line…)
When it comes to small focused OSS tools, you can basically code them at the speed of thought with modern LLMs. The slow part is figuring out what you want to do in the first place, the implementing is basically instantaneous. It's literally faster than googling a tool that already does the job.
And yes, that doesn't scale to all problem domains or problem sizes, but in some areas even a 20x speedup would be a huge understatement.
Engineer at a unicorn SaaS startup: comparing the number of Github PRs for two spikes of my maximum output (had to write a lot of production code for weeks) before and after AI shows 27% increase.
Depends hugely on the task - anywhere from -50% to +1000%. This is copilot in vscode for vuejs+fastapi in docker. The trick is getting the right degree of handsoffness for the task.
I'd say on average, probably around the +100% mark, mostly as a lot of the work I'm doing currently are simpler tasks - "please add another button to do X". The backend logic is simple, and easy to security check.
Where I'm really not sure on productivity is when I get it to help generate tests that involve functionality across multiple domains of the application - possibly +0%.
It changed my ability to create something in the evenings where i would normally be to tired to work on a project. Also it changed my work to be a frontend designer/tester instead of programming, i was able to iterate on ideas much faster and create a much more usable product. But getting the code to work flawless is a pain and it makes me clean up and re-test tons of times.
I've been building specific tools in languages and frameworks I don't understand, nor have ever found time to learn. So the producitvity boost is hard to measure; its a "what's 0 times how fast the LLM can do it" type of answer.
In the cases where I'm already an expert and I'm working in a legacy system and/or on a hard problem, it's small change. Where I have no knowledge, it's an incalculable change because it's enabling me to (quickly) do things I could not before.
Massive x100 speed-up for writing simple, mind-numbing stuff.
Algorithm optimization +20%*(Depends on domain).
Obscure APIsx10(deals with APIs much better than docs)
Debugging is massively easier(incomparably).
I don't know waht HN hate for AI is coming from,
but its ideal for prototyping/debugging/refining code.
Minor polishing/editing specifics is much easier for me,
rather than writting a correct "draft v1.0" from scratch:
since i typicall evolve structures from primitive(v0.1)
while AI gives me a chance to start from 1.0 and polish it
later(optimize/fix/improve/etc).
For "from scratch to working prototype" workflow would mean
months of coding condense to days(mostly debugging/flaw finding).
Drawbacks:I feel like more code is simplified, no ad-hoc complex
structure after AI rewrite. Pure "clean code" with most
standard optimizations and control flow.
AI can fix "good complex code" into simplified "enterprise ready" control flow charts in code form,
it lacks the sense of 'code quality' instead thinking verbose, clean code is the quality marker.
I am surprised by all the answers saying 0 percent. I'd understand that if you use use it without any grounding or additional context. I am far faster with it, using one MCP server with docs and one for language specific features. If the code base is large enough, serena is a must have. I'd say about the same, groking an unknown code base is a lot faster, producing for something I am familiar with maybe 4-5x
I think about 4x for greenfield development. For example, I implemented this motion controlled game from scratch in 5 days: https://motionparty.net
For this kind of no expectations, for fun development I find AI makes it much easier to develop and test hypotheses. For other styles it's different, especially if the stakes are higher.
Any positive change to my output is likely only because I now need to use it to supplement Google searching, because Google search is so damn awful nowdays.
But to describe my latest (latest, not only) experience with an LLM: I was with my toddler last night and I wanted to create a quick and dirty timer displayed as a pizza (slices disappear as timer depletes) to see if that can help motivate him during dinner. HTML, JS, SVG.. thought this would be cake for OpenAI's best free model. I'm a skeptic for sure, but I follow along enough to know there's voodoo to the prompt, so I tried a few different approaches, made sure to keep it minimal beyond the basic reqs. It couldn't do it: first attempt was just the countdown number inside an orange circle; after instruction, the second attempt added segments to the orange circle (no texture/additional SVG elements like pepperoni); after more instruction, it added pepperoni, but now there was a thick border around the entire circle even where slices had vanished. It couldn't figure this one out, with its last attempt just being a pizza that gradually loses toppings. I restarted the session and prompted with some clarifications based on the previous session but it was just a different kind of shit.
Despite being a skeptic I'm somewhat intrigued by the idea of agents chipping away at problems and improving code, but I just can't imagine anyone using this for anything serious given how hard it fails at trivial stuff like this. Given that MS guy is talking big game about planning to rewrite significant parts of Windows in Rust using AI, and is not talking about having rewritten significant parts of Windows in Rust using AI, I remain skeptical of anyone saying AI is doing heavy lifting for them.
Alternates between +5% and -99.9%. +5 when I can shortcut entire morning worth of work under hour (happens about once in a week) and -99.9% when my job is threatened by managers with AI "vibing" claims (happens about once in a week too)
It depends massively on the task. Not less than 10% if it’s just helping review my PRs before submission. It can be quite a large improvement, where for example, I wanted to rebuild a Postgres layer for a new company but didn’t have the one I’d written previously. I’m knew exactly what I wanted and probably did two days of tedious work in an hour. It does slow me down some too when it makes mistakes, but that’s rare.
I was very very keen on using this tech when it just emerged and was almost addicted to it.
The feeling I was gaining was akin to scrolling one of them feed-generating apps with cat videos, or eating fast food - quick and empty joy and excitement without lasting fulfilling effect.
After months, or maybe a year of using LLMs - I realized that neither I am faster in delivering the final desired quality of the product, nor am I satisfied with where I am professionally. I lose my human skills and I forget how to get joy out of my work (and I enjoy making them computers work). I noticed how I grew negligent to the results of my work, felt disconnected from and disconcerted by it, and it was alarming sign.
Anybody who worked anywhere for long enough knows that unhappy, robbed of joy people - produce worse results.
Said that, I realized loud and clear that writing code, developing systems, is something that I want to experience joy of personally, I want my brain struggle, sweat and click through most of the parts of it, and that parts of the work that I don't enjoy doing and can be shamelessly offloaded to LLM, are, in fact, quite minimal.
On top of it, while using the LLMs (and boy was I using all of it! Sub-agents, skills, tasks! Hour-long planning-prompting exercises!), I was still noticing that when it comes to "write code" tasks - LLMs were only better and faster than me in delivering quality work when the task at hand was not my primary skill, something that I'd be below average (in my case, web-development or app development, any front-end work). And, given, that I was employed to exercise my main skill rather than secondary skills, it's almost never the case that LLMs were boosting my productivity, they required long time of baby-sitting before I'd almost certainly give up and do all of the work myself.
Admittedly, LLMs are outstanding study partners that can give pointers to where to start and what structure to stick to when learning about new project, technology, problem domain, or generating flash cards based on materials one wants to study, and that's a use of LLM that I'd probably not give up. My speed of learning new things with some LLM assist is boosted greatly, so from this perspective, one can say that LLM makes me better developer after all.
As other have have already said, this is highly context dependant but I want to bring some examples.
- It sometimes generate decent examples of linux kernel api usage for modules, which saves a lot of time digging in the limited documentation. But most of the time it will mix deprecated a new version of the api and it will very likely have very suboptimal and/buggy choices.
- for embedded C, it won't be able to work within the constraints of a very specific project (e.g. using obscure or custom hal), but for generic C or rtos applications it can generate between decent and half decent suggestions.
I’m not sure you’re going to get any useful results from this question.
Some people find it useful, some people don’t, and unless what you’re using it for matches what they’re using it for (which you’re not asking) none of the results you get give you any insight into what you should expect for your use case.
Oh well, whatever. Here’s my $0.02; on a large code base that takes up to 30 minutes to do a local type check in typescript, the net benefit of AI is neutral or negative, because the agent can’t loop effectively and check it’s own results.
AI scaffolded results are largely irrelevant and don’t use internal design system components or tokens for UI and are generally useless.
Objectively measured ticket completion rates are not meaningfully impacted by the use of AI.
Out of date documentation leads agents to build incorrect solutions using outdated and depreciated techniques and services.
This is true across multiple tools and multiple models used, including sota.
1x
It is not more productive.
This reflects on my personal experience in the last 8 months of intense (and company mandated) AI usage at work.
At home, for small personal projects, I would say it’s closer to the 2x you describe, maybe as much as 3x for building and iterating on rich web UI using react.
So, in my professional work environment, which is a complex C++ app with long build times and lots of proprietary domain-specific knowledge, LLMs were worse than useless. Not totally surprising, but I originally had hopes for something like Cursor being useful in terms of simplifying the process of large mechanical refactors, which it decidedly wasn't. It could suggest interesting and complex uses of C++ templates, but its application were very slop-adjacent, and even for less complex refactors it would attempt and fall apart in the "check" part of the "guess and check" loop, probably because of the whole "long build times" thing. So we're still paying for visual assist and just wishing it had more (and more specific!) kinds of mechanical refactors available.
for personal projects (more polyglot but rust, js, python, and random shell scripts are bigger and more important here) it's been more mixed to positive; and this is (i think?) in part because i have the luxury of writing off things I'm _not actually_ interested in doing. maintaining cmake files sucks, and the free tier of Cursor does a good enough job of it. I have a few small plugins/extensions for things like blender, and again, I don't know enough to do a good job there, and the benefit of making something extremely specific to what i need without actually knowing what's going on under the hood works fine: I can just verify the results, and that's good enough. but then, conversely, it's made it _wayyyy_ harder to pick and verify third party libraries for the things i do care about? I'll look something up and it'll either be 100% AI vibe coded and not good enough to sneeze at, or it'll be fine, but the documentation is 100% AI generated and likewise, I would rather just have the version of this library before AI ever existed.
more and more I'm convinced LLM agents are only fit for purpose for things that don't need to be good or consistent, but that there is actually a viable niche of things that don't need to be good that it can nicely slot into. That's still not worth $20/month to me, though. and it's absolutely ruining the online commons in a way that makes it hard to feel good about.
(my understanding of claude code is that it's a non-interactive agent, which is worse for what i have in mind. iteration and _changing my mind_ are a big part of my process, so even if I let the computer do its own thing for an hour and work on something else, that's less productive than spending even 10 minutes of focused time on the same thing.)
>(my understanding of claude code is that it's a non-interactive agent, which is worse for what i have in mind. iteration and _changing my mind_ are a big part of my process, so even if I let the computer do its own thing for an hour and work on something else, that's less productive than spending even 10 minutes of focused time on the same thing.)
Just use 'plan mode'; it will ask clarifying questions.
At home, on my side projects it's made me 2x to 4x.
At work, also 2x to 4x.
The numbers would be even higher (4x to 8x) if I didn't spend half the time correcting slop and carefully steering the AI toward the desired solution when it gets sidetracked. But then again, I was also guilty of those things so maybe it's an even score?
Perhaps it's partly psychological in that using it forces me to think through the problems I'm trying to solve differently than before. Perhaps I'm just a mediocre dev and the AI is bringing me up to "slightly above average," but a win is a win and I'll take it.
I'm learning Python on the job, with decades of experience in other languages. Asking an LLM noob Python questions is probably 5-10x faster than googling around and reading docs or tutorials, and almost 100% accurate. I also ask it about git commands I seldom use (I tend to confuse reset and restore, though not revert for some reason).
Introductory questions about a widely used language with great documentation and tons of tutorials are made for LLMs.
Rephrasing the question: By what percentage has AI changed your input quality?
Answer would be around -50%. This is attributed mostly to the vast amount of search results that are AI generated and provide very low density information and miss conveying actual key learning points. This means you have to scan through 100% more text to finally get the information you need to solve the issue. I think this is a low estimate actually.
0% for me too, as on the few occasions I tried it, it gave completely useless responses that were so far off the mark, if I didn't know better it would've lead me on a completely wrong path.
Both faster and slower. The problem is you cannot trust LLMs with even the easiest of tasks. Generating boilerplate, better and more flexible than a code generator would, yes. Writing UIs that you then fix up, yes.
But business logic that needs to be trusted? Nooooo.
So very, very rarely 1000% faster. Most of the time 10% slower as I have to find and fix issues.
I'll still say it's a net improvement, but not by much. So let's say 10%.
1.3x when working on a large janky which codebase I am very familiar with, very unevenly distributed.
- Writing new code it's probably 3x or so[1].
- Writing automated tests for reproducible bugs, it's probably 2x or so.
- Fixing those bugs I try every so often but it still seems to be a net negative even for Opus 4.5, so call it 0.95x because I mostly just do it myself.
- Figuring out how to reproduce an undesired behavior that was observed in the wild in a controlled environment is still net negative - call it 0.8x because I keep being tempted by this siren song[2]
- Code review it's hard to say, I definitely am able to give _better_ reviews now than I was able to before, but I don't think I spend significantly less time on them. Call it 1.2x.
- Taking some high-level feature request and figuring which parts of the feature request already exist and are likely to work, which parts should be built, which parts we tried to build 5+ years ago and abandoned due to either issues with the implementation or issues with the idea that only became apparent after we observed actual users using it, and which parts are in tension with other parts of the system: net negative. 0.95x, just from trying again every so often.
- Writing new one-off utility tools for myself and my team: 10x-100x. LLMs are amazing. I can say "I want to see a Gantt chart style breakdown of when jobs in a gitlab pipeline start and finish each step of execution, here's the network log, here's a link to the gitlab api docs, write me a bookmarklet I can click on when I'm viewing a pipeline" and go get coffee and come back and have a bookmarklet[3].
Unfortunately for me, a significant fraction of my tasks are of the form "hey so this weird bug showed up in feature X, and the last employee to work on feature X left 6 years ago, can you figure out what's going on and fix it" or "we want to change Y functionality, what's the level of risk and effort".
-----
[1] This number would be higher, but pre-LLMs I invested quite a bit of effort into tooling to make repetitive boilerplate tasks faster, so that e.g. creating the skeleton of a unit or functional test for a module was 5 keystrokes. There's a large speedup in the tasks that are almost boilerplate, but not quite worth it for me to write my own tooling, counterbalanced by a significant slowdown if some but not all tasks had existing tooling that I have muscle memory for but the LLM agent doesn't.
[2] This feels like the sort of thing that the models should be good at. After all, if I fed in the observed behavior, the relevant logs, and the relevant files, even Sonnet 3.7 was capable of identifying the problem most of the time. The issue is that by the time I've figured out what happened at that level of detail, I usually already know what the issue was.
[3] Ok, it actually took a coffee break plus 3 rounds of debugging over about 30 minutes. Still, it's a very useful little tool and one I probably wouldn't have spent the time building in the before times.
With AI, I am a 10X developer now. At minimum. I use it for my C# and JavaScript day job. Copilot in VS.
I work on a code base that is easily over 1 million lines. It has had dozens of developers work on it over the last 15 to 20 years. Trying to follow the convention for the portion of the code base that I work on alone is a pain. I’ve been working on it for about seven years and I still had to ask questions.
So I would say that I work on a could base with a high level of drudgery. Having an all knowing AI companion has taken an awful lot of the stress out of every aspect.
Even the very best developer I’ve ever worked with can’t match me when I’m using AI to augment my work. For most development tasks, being the best of the best no longer matters. But in a strange way, you still need the exact same analytical mindset because now it’s all about prompts. And it definitely does not negate the need for a developer.
Writing your own code is essentially just an exercise in nostalgia at this point. Or someone who prefers to pick seeds out of cotton themselves, instead of using a cotton gin.
Or perhaps instead of using voice dictation to write this post, I would write a letter and mail it to Hacker News so that they can publish my comment to the site. That’s how backwards writing code is quickly becoming.
5-10% higher in the short term. Who knows in the long term?
I've implemented a few trivial things that I wouldn't have done before if I'd had to type them up by myself.
But the big things that actually take time? Collecting requirements, troubleshooting, planning, design, architecture, research, reading docs? If having basic literacy is 100, then I'd say having LLMs adds about 0.1, if that.
I have yet to see a study design that sufficiently controls for all of the variables. In general it seems that if you could do the work on your own, it may not save time, and in the long run, most all output needs to be reviewed, perhaps moreso than if you were composing it yourself... And the additional variables include things like finding out of the box solutions either with or without the model outputs, which is hard to control for as well. Even more impossible to control variables are model quality, training material related to the topic, and many more of those classes of issues that may not be publically available or even possible to trace fully. Truly, anecdotes are not informative of the general experience. It is literally the cliche that is mentioned on this site periodically "these are the lotto numbers that worked for me."
Net negative. People keep using AI slop and are driven by shitty metrics from the top to use AI as much as possible. It leads to the entire codebase being garbage. I blame the higher ups. Everyone is just doing their job and has no room to negotiate here. Everything is tracked via metrics and you’re required to do a bunch of stupid AI slop shit otherwise you’re at risk of getting fired.
I am starting to use them less. I only use GLM now, the supposed "gains" from "better" models (like Gemini or Claude) are an illusion at best. They can waste you a lot of time as you swap through the different implementations, or as you debug bad structural decisions that it took and you didn't take the time to read/contemplate.
They can be useful to one-shot code that you already know exactly what it should be. Because if you already have the mental model of what the code should be, you can read it x10 faster. The other useful thing is: complex multi-dimensional search. You can explain a process and it'll walk you through the code base. Useful if you already have knowledge of the code base and you need to refresh your memory.
In general, now, I'd consider LLMs extremely harmful. They can introduce very high-interest debt to your code base and quickly bring it to collections. The usage must be carefully considered and deliberated.
The dotfile example pertinent, I can't be arsed to remember the syntax for every little thing that I don't interact with on a daily basis. Bash scripts, build tool chains, infrastructure code, all that tertiary stuff. While it isn't a huge number, the amount of time saved not having to google and relearn something every time I want to make a change adds up over time.
Of course this is undone by all the AI slop my boss passes by us now.
I think what most people answering in this thread who are saying something like "10x" are missing is that this number does probably not apply to their overall productivity. Sure, some boilerplate task that took you 10 min of writing code now maybe takes 1 min. But I think people just take this number and then extrapolate it to their whole work day.
I mean if you everybody would be truly 10x more productive (and at the same output quality as before, obviously), this would mean that your company is now developing software 10 times faster as 2 years ago. And there's just no way that this is true across our industry. So something is off.
Caveats:
- Yes, I get that you can now vibe code a pet project on the weekend when you were just too tired before, but that's not really the question here, is it?
- Yes, if your whole job is just to write boilerplate code, and there's no collaboration with others involved, no business logic involved, etc., ok, maybe you are 10x more productive.
I haven’t gained much productivity by using any of the LLMs, particularly not the recent ones as their thinking ability is nothing else than overthinking and overcomplicating everything.
What I really use LLMs for is to uncover what I should not do, and this is quite a strong win.
Overall, they are generally very bad at giving me anything useful for the things I build.
Extremely context-dependent for me.
For boilerplate stuff like generating tests against a well-defined API, in a compiled language, maybe 2-3x. Far less in languages and frameworks like Ruby/Rails where the distance between generating the code and figuring out if it’s even valid or not is large.
Mechanical refactors that are hard to express via e.g. regex but easy in natural language: maybe 5x.
HTML and CSS, where I know exactly what I want and can clearly articulate it: 2-5x.
For anything architecture-y, off the beaten path, or where generating a substantial amount of code is required: near 0%. Often in the negatives.
At home pissing about- 5x faster. At work, I think actively slows me down. At first I thought the difference was that at work I needed something very precise, and I wasn’t good enough at prompting to deliver that precision. Now I think it’s that if I try to do a hobby project and the AI steamrolls over any prompting to do a slightly different worse hobby project, I’m way less likely to notice or resist because there’s no iron requirement of profit maintaining discipline.
There has been very little change for me. The one area where AI has been of use is for providing debugging suggestions in those circumstances when I am so short of ideas that I am contemplating posting a question on Stack Overflow. Only once has Chat GPT actually found a bug directly, but on a handful of occasions it has made suggestions which have yielded results when investigated further in traditional way (e.g. reading documentation). Still, At least the AI actually does something rather than ignore me or scold me for the way I have posed my question.
Working as ML engineer/researcher:
- LLMs are absolutely abysmal at PyTorch. They can basic MLP workflows, but that's it more or less. 0% efficiency gained.
- LLMs are great at short autocompletes, especially when the code is predictable. The typing itself is very efficient. Using vim-like shortcuts is now the slower way to write code.
- LLMs are great at writing snippets for tech I am not using that often. Formatting dates, authorizing GDrive, writing advanced regex, etc. I could do it manually, but I would have to check docs, now I can have it done in seconds.
- LLMs are great at writing boilerplate code, e.g. setting up argparse, printing the results in tables, etc. I think I am saving hours per month on these.
- Nowadays I often let LLMs build custom HTML visualization/annotation tools. This is something I would never do before due to time constraints, and the utility is crazy good. It allows my team to better understand the data we are working with.
10x when working on a code base I'm very familiar with.
Basically, it amounts to being able to give detailed instructions to a junior dev (who can type incredibly fast) and having them carry out your instructions.
If you don't know the code base, and thus can't provide detailed instructions, this junior dev can (using their incredible typing speed) quickly run off the rails. In this case, as you don't know the code base, you wouldn't know it's off the rails. So you're S.O.L.
LLms are both faster, smarter and way dumber than a junior at the same time.
They work faster, but more often make wrong assumptions without asking. Llms dont ask the stupid questions a junior might, but those questions are essential to getting it right.
It really helps where the code I'm writing fits the broad description of boilerplate.
Need to integrate Stripe with the Clerk API in my Astro project? Claude's all over that. 300% faster. I think of it like, if there was a package that did exactly what I wanted, I'd use that package. There just happens not to be; but Claude excels at package-like code.
But as soon as I need to write any unique code – the code that makes my app my app – I find it's perhaps a touch faster in the moment, but the long-term result isn't faster.
Because now I don't understand my code, right? How could I. I didn't write it. So as soon as something goes wrong, or I want to add a feature, either I exacerbate this problem by getting Claude to do it, or I have to finally put in the work that I should have put in the first time.
Or I have to spend about the same amount of time creating a CLAUDE.md that I would have if I'd just figured out the code myself. Except now the thing I learned is how to tell a machine how to do something that I actually enjoy doing myself. So I never learn; on the contrary, I feel dumber. Which seems a bit weird.
And if I choose the lazy option here and keep deferring my knowledge to Claude, now I'm charging customers for a thing that I 'vibe coded'. And frankly if you're doing that I don't know how you sleep at night.
This. LLMs are good at stuff that is very general (is often in the dataset). What i gain most from LLM is when i use it to teach me - like extended documentation.
But to make unique solutions you will get pretty random results and worse you are not building understanding and domain knowledge of your program.
Claude Code sounds cool until it makes 3 changes at once 2 of which you are unsure if they are required or if they wont't break something else. I like it for scripts, data transformations and self contained small programs where i can easily verify correctness.
> What i gain most from LLM is when i use it to teach me - like extended documentation.
This, yes. What I do now is use Claude but expressly tell it do not edit my code, just show me, I want to learn. I'm not a very experienced dev so often it'll show me a pattern that I'm unfamiliar with.
I'll use that new knowledge, but then go and type out the code myself. This is slower, in the moment. But I am convinced that the long-term results are better (for me).
It is especially helpful when new in some framework where you aim to follow best practices so others can follow and you don't end up reinventing.
I don’t know, it still feels like a productivity lottery to me. The simpler the task, the higher the odds of a big productivity gain, and it can be probably an order of magnitude quicker, especially if the output is very verbose, like frontend tends to be.
But I still haven’t dialed exactly what is too complicated for the LLM to handle (and that goalpost seems to still be moving, but slower now). Because it is almost always very close, I often end up trying to fix the prompt a few times before giving up and just doing it from scratch myself. I think in total the productivity gain for me is probably a lot less than 100%, but more than 0%.
Very hard to estimate, depending on the domain, I'd say 1.5-2x as much.
When it comes to programming in languages and frameworks I'm familiar with, there is virtually no increase in terms of speed (I may use it for double checks), however, it may still help me discover concepts I didn't know.
When it comes to areas I'm not familiar with:
- most of the time, the increase is substantial, for example when I need a targeted knowledge (e.g. finding few APIs in giant libraries), or when I need to understand an existing solution - in some cases, I waste a lot of time, when the LLM hallucinates a solution that doesn't make sense - in some other cases, I do jobs that otherwise I wouldn't have done at all
I stress two aspects:
1. it's crucial IMO to treat LLMs as a learning tool before a productivity one, that is, to still learn from its output, rather than just call it a day once "it works"
2. days of later fixing can save hours of upfront checking. or the reverse, whatever one prefers :)
It’s immeasurable. I use AI for powering through personal projects, which would not have gotten done without AI because I also have a job and a life. It allows me to focus on the product and requirements rather than the code. It’s hard to measure because the projects would simply not have gotten done without it.
The most fascinating thing about AI is how in a thread like this one, answers range between 0% and infinity.
To be accurate, it’s between negative gains and infinity.
Personally I do not trust for a second self-reports anyways. They are bound to be wrong.
To be fair, for my coding at work, AI is “only” like a 2x booster because stuff at work is a lot less greenfield.
I simply have more fun doing my job.
At work, many of my colleagues are too busy to collaborate and brainstorm with me, who enjoys it a lot and have a lot of energy for it.
They are classic 9to5 and that's fine, but I like software development and talking about it.
So instead of only doing it by myself, or tiring out my colleagues, I collaborate with AI.
-100%. With this looming Shiny Toy plus other sub-optimizations at my last place of work, I decided to retire. Didn't want to deal with it.
10x to 1x. Usually 1.5x, maybe.
Each situation has a different bottleneck, and it's almost never how fast you can write lines of code.
You would need: All engineers aligned on AI use. Invested in automating your: unit tests, integration tests, end to end tests / code quality controls / documentation quality controls / generated api docs / security scans / deployments / feature environments / well designed internal libraries / feature flags / reviews / everything infrastructure / have standards for all of these, so on and so forth. You need to lose your culture of meetings to fix miscommunication. You need to centralize product planning and communication. Stop having 100 different tools at the same time (Jira, Email, Confluence, Slack, Teams, GitHub, BitBucket, GitLab, Sharepoint, ...) where you keep snapshots of what you wanted to do, at some point in time. You need to have a high trust culture. You need to understand mistakes will happen more frequently. You probably don't have production incidents often, because you deploy once a month. You will go fast and the faster you go, even with a low failure rate mistakes happen more often, and you'll need to be prepared for that too. Unfortunately most organizations are missing 99% of the above, organizations like to have layers of communication scattered in all kinds of tools, because hey X tool fixes my problem, they need 2 hour meetings so everyone is aligned on where the button goes and whether the button has to be green or blue, and 10 engineers need to be present in the room too, so 20 engineering hours. Then they go to production once a month.
So if you have solved all that then the bottleneck becomes lines of code per minute, and you could rebuild most products in a few days.
For me personally, it ranges from 10x to 1x. On my own projects, and on projects where the development experience is really really great, easily 10x. We would never have brought that much live in these short timespans without AI assisted software development. In large businesses where 20 people need to stare at a Jira board to decide on the most basic things and give feedback through Confluence comments and emails.... Yeah the bottleneck is not how fast you can write lines of code.
Two month ago, i'd say 5%. With Opus4.5, Gemini 3 and gpt 5.2, it's now 20%, maybe 50% if we only talk about personal code output.
If we talk about my whole team output, I'd say the impact on code production is like 80-100%, but the impact in velocity is between 10% and -25%. So many bug in production, security holes, so many poor models definition making it to production DB only for me and the other true senior to have to fix it.
We are seniors, we read your PRs and try our best to do it thoroughly, but with AI multiplying your code output, and writing quite convincing solutions, it's way harder, so please: if an AI have written the code in your PR, verify the tests are not superficial, verify everything works, think for yourself about what the model is used for and if it can be improved before release. Re-verify the tests (especially if the AI had issues writing/passing them). And do it once more. Please (hopefully one of my coworkers will read this).
How would we describe the productivity gains brought on by engaging with tasks we did not have the ability or confidence to prior? The denominator looks like a zero in many places for me.
Yes especially for after hours fun projects where I might have a few hours a week to work on them. Previously I would get so bogged down by boilerplate I had kind of given up on trying. Now with AI tools I can spend more time thinking about the most important problems and send off agents to do the gruntwork.
It is more useful to know what exactly you're writing beyond the description of "business logic and real world problem".
For example, writing UI components maybe easier on some language than writing abstract algorithms, writing standardized solutions (i.e. text-book algorithms, features, etc) is easier than writing customized algorithms, etc.
Also, writing code can be very fast if you don't unit test it, especially for CURD apps. In fact most of my coding time was spent on writing unit tests.
I really hoped AI could write those tests for me to lock the specs of the design completely down. But currently it's the opposite, I have to write test for AI generated code. So, my over all experience can be described as the tyranny of reading other people's code times 10.
I'd say on average about 50% faster but it really depends on the task at hand. On problems that can be isolated pretty well like a new feature that is relatively isolated (for example building a file export in a specific format) it's easily a 10x speed up. One thing that generally gets less talked about is exploration of the solution space during manual implementation. I work in a very small company and we build a custom ERP solution. Our development process is very stripped down (a good thing IMO). Often times when we get new requirements we brain storm and make a rough design. Then I try implement it and during that phase new questions and edge cases arise, and at any time this happens we adjust the design. In my opinion this is very productive as the details of the design are worked out when I already know the related code very well as I already got down to implementing. This leads to a better fitting design and implementation. Unfortunately this exploration workflow is incompatible with llms if you use them to do the implementation for you. Which means that you have put more effort in the design up front. From my experience that means the gain in speed in such task is nullified and also results in code that fits worse into the rest of the codebase.
Zero. I don't use it as it's garbage.
It sometimes feels like 10x or more when generating code, however all this code has to be reviewed, which is currently still an 1x task and takes about 90% of the time nowadays. Factoring in the code review I would estimate an overall 2x
For me it feels like roughly a 10–20x change, but mostly because I restructured how I work rather than just adding an “AI helper” on top.
In the last year I’ve shipped a couple of small OSS tools that I almost certainly would not have finished without AI‑assisted “vibe coding”. Everything I build now flows through AI, but in a slightly different way than just chatting with an LLM. I rarely use standalone ChatGPT/Gemini/Claude; almost all of it happens inside GitHub with Copilot and agents wired into my repos.
The big shift was treating GitHub as the interface for almost all of my work, not just code. I have repos for things like hiring, application review, financial reviews, and other operational workflows. There are “master” repos with folders and sub‑folders, and each folder has specific context plus instructions that the AI agent should follow when operating in that scope, essentially global rules and sub‑rules for each area of work.
Because of that structure, AI speeds up more than just the typing of code. Idea → spec → implementation → iteration all compress into much tighter loops. Tasks that would have taken me weeks end‑to‑end are now usually a couple of days, sometimes hours. Subjectively that’s where the 10–20x feeling comes from, even though it’s hard to measure precisely.
On the team side we’ve largely replaced traditional stand‑ups with AI‑mediated updates. KPIs and goals live in these repos, and progress/achievements are logged and summarized via AI, which makes updates more quantitative and easier to search back through. It’s starting to feel less like “AI helps me with code” and more like “AI is the main operating system for how our team works.”
Happy to share more about the repo/folder structure or what has/hasn’t worked if anyone’s curious.
> On the team side we’ve largely replaced traditional stand‑ups with AI‑mediated updates
Can you expand on this? What was a traditional stand-up before and what does an AI-mediated one look like?
I'd be interested in how you setup those repos for non-coding tasks, thanks for sharing!
If you were 20x faster you'd have done an entire career's worth of progress this year.
Have you? Are you making tons of money? Have you achieved 20x the amount than you have all previous years?
Take a step back and realize what you're claiming here.
While the multiplier is less for me ( perhaps 3x or 4x ) I think the assumption that productivity gain leads to directly to more money is bit optimistic. Unless you are self employed, or run your own company, and actually get paid by results, being more efficient is seldom paid much. ( with luck you get promotion every two years or so or pay raise every year )
I have worked for too long in the field, but this year and simply thanks to the LLMs I have actually managed to get 4 usable hobby projects done ( as far as I need them to go anyway - personal tools that I use and publish but do not actively maintain unless I need some new feature ), and I have been quite productive with stack I do not normally use at our startup. Most previous years I have finished 0-1 hobby projects.
The ramp up period for new stack was much less, and while I still write some code myself, most of it at least starts as LLM output which I review and adjust to what I really want. It is bit less intellectually satisfying but a lot more efficient way to work for me. And ultimately for work at least I care more about good enough results.
Completely unrelated, but what’s with the spaces inside the parenthesis here? It’s super weird (and leads to incorrect text layout with a standalone parenthesis at the end or beginning of a line…)
When it comes to small focused OSS tools, you can basically code them at the speed of thought with modern LLMs. The slow part is figuring out what you want to do in the first place, the implementing is basically instantaneous. It's literally faster than googling a tool that already does the job.
And yes, that doesn't scale to all problem domains or problem sizes, but in some areas even a 20x speedup would be a huge understatement.
It could be ie 2x faster with 10x less staff/cost as well, right?
Engineer at a unicorn SaaS startup: comparing the number of Github PRs for two spikes of my maximum output (had to write a lot of production code for weeks) before and after AI shows 27% increase.
Depends hugely on the task - anywhere from -50% to +1000%. This is copilot in vscode for vuejs+fastapi in docker. The trick is getting the right degree of handsoffness for the task.
I'd say on average, probably around the +100% mark, mostly as a lot of the work I'm doing currently are simpler tasks - "please add another button to do X". The backend logic is simple, and easy to security check.
Where I'm really not sure on productivity is when I get it to help generate tests that involve functionality across multiple domains of the application - possibly +0%.
It changed my ability to create something in the evenings where i would normally be to tired to work on a project. Also it changed my work to be a frontend designer/tester instead of programming, i was able to iterate on ideas much faster and create a much more usable product. But getting the code to work flawless is a pain and it makes me clean up and re-test tons of times.
I am working on this: https://github.com/ludos1978/ludos-vscode-markdown-kanban
I've been building specific tools in languages and frameworks I don't understand, nor have ever found time to learn. So the producitvity boost is hard to measure; its a "what's 0 times how fast the LLM can do it" type of answer.
In the cases where I'm already an expert and I'm working in a legacy system and/or on a hard problem, it's small change. Where I have no knowledge, it's an incalculable change because it's enabling me to (quickly) do things I could not before.
For example, I do not know rust but I've been using AI to make https://git.sr.ht/~kerrick/ratatui_ruby at a really rapid pace.
Massive x100 speed-up for writing simple, mind-numbing stuff. Algorithm optimization +20%*(Depends on domain). Obscure APIsx10(deals with APIs much better than docs) Debugging is massively easier(incomparably). I don't know waht HN hate for AI is coming from, but its ideal for prototyping/debugging/refining code. Minor polishing/editing specifics is much easier for me, rather than writting a correct "draft v1.0" from scratch: since i typicall evolve structures from primitive(v0.1) while AI gives me a chance to start from 1.0 and polish it later(optimize/fix/improve/etc). For "from scratch to working prototype" workflow would mean months of coding condense to days(mostly debugging/flaw finding).
Drawbacks:I feel like more code is simplified, no ad-hoc complex structure after AI rewrite. Pure "clean code" with most standard optimizations and control flow. AI can fix "good complex code" into simplified "enterprise ready" control flow charts in code form, it lacks the sense of 'code quality' instead thinking verbose, clean code is the quality marker.
I am surprised by all the answers saying 0 percent. I'd understand that if you use use it without any grounding or additional context. I am far faster with it, using one MCP server with docs and one for language specific features. If the code base is large enough, serena is a must have. I'd say about the same, groking an unknown code base is a lot faster, producing for something I am familiar with maybe 4-5x
I think about 4x for greenfield development. For example, I implemented this motion controlled game from scratch in 5 days: https://motionparty.net
For this kind of no expectations, for fun development I find AI makes it much easier to develop and test hypotheses. For other styles it's different, especially if the stakes are higher.
Any positive change to my output is likely only because I now need to use it to supplement Google searching, because Google search is so damn awful nowdays.
But to describe my latest (latest, not only) experience with an LLM: I was with my toddler last night and I wanted to create a quick and dirty timer displayed as a pizza (slices disappear as timer depletes) to see if that can help motivate him during dinner. HTML, JS, SVG.. thought this would be cake for OpenAI's best free model. I'm a skeptic for sure, but I follow along enough to know there's voodoo to the prompt, so I tried a few different approaches, made sure to keep it minimal beyond the basic reqs. It couldn't do it: first attempt was just the countdown number inside an orange circle; after instruction, the second attempt added segments to the orange circle (no texture/additional SVG elements like pepperoni); after more instruction, it added pepperoni, but now there was a thick border around the entire circle even where slices had vanished. It couldn't figure this one out, with its last attempt just being a pizza that gradually loses toppings. I restarted the session and prompted with some clarifications based on the previous session but it was just a different kind of shit.
Despite being a skeptic I'm somewhat intrigued by the idea of agents chipping away at problems and improving code, but I just can't imagine anyone using this for anything serious given how hard it fails at trivial stuff like this. Given that MS guy is talking big game about planning to rewrite significant parts of Windows in Rust using AI, and is not talking about having rewritten significant parts of Windows in Rust using AI, I remain skeptical of anyone saying AI is doing heavy lifting for them.
It only helps with medium article tier problems, anything that requires critical thinking and actual knowledge is a waste of time.
Hard to self-estimate. The big change for me has been that I would now do more things than before. But maybe 2-5x.
Alternates between +5% and -99.9%. +5 when I can shortcut entire morning worth of work under hour (happens about once in a week) and -99.9% when my job is threatened by managers with AI "vibing" claims (happens about once in a week too)
It depends massively on the task. Not less than 10% if it’s just helping review my PRs before submission. It can be quite a large improvement, where for example, I wanted to rebuild a Postgres layer for a new company but didn’t have the one I’d written previously. I’m knew exactly what I wanted and probably did two days of tedious work in an hour. It does slow me down some too when it makes mistakes, but that’s rare.
Hard to tell... It's not double or more net gain, that I can tell.
I'm not even entirely sure it's a net positive at this point but it feels like it.
I would say probably between 20% to 50% more productive.
It made me 2x slower, because with Claude's hallucinations I waste more time chasing dead ends than I gain from it in productivity.
Negative percentage.
I was very very keen on using this tech when it just emerged and was almost addicted to it.
The feeling I was gaining was akin to scrolling one of them feed-generating apps with cat videos, or eating fast food - quick and empty joy and excitement without lasting fulfilling effect.
After months, or maybe a year of using LLMs - I realized that neither I am faster in delivering the final desired quality of the product, nor am I satisfied with where I am professionally. I lose my human skills and I forget how to get joy out of my work (and I enjoy making them computers work). I noticed how I grew negligent to the results of my work, felt disconnected from and disconcerted by it, and it was alarming sign.
Anybody who worked anywhere for long enough knows that unhappy, robbed of joy people - produce worse results.
Said that, I realized loud and clear that writing code, developing systems, is something that I want to experience joy of personally, I want my brain struggle, sweat and click through most of the parts of it, and that parts of the work that I don't enjoy doing and can be shamelessly offloaded to LLM, are, in fact, quite minimal.
On top of it, while using the LLMs (and boy was I using all of it! Sub-agents, skills, tasks! Hour-long planning-prompting exercises!), I was still noticing that when it comes to "write code" tasks - LLMs were only better and faster than me in delivering quality work when the task at hand was not my primary skill, something that I'd be below average (in my case, web-development or app development, any front-end work). And, given, that I was employed to exercise my main skill rather than secondary skills, it's almost never the case that LLMs were boosting my productivity, they required long time of baby-sitting before I'd almost certainly give up and do all of the work myself.
Admittedly, LLMs are outstanding study partners that can give pointers to where to start and what structure to stick to when learning about new project, technology, problem domain, or generating flash cards based on materials one wants to study, and that's a use of LLM that I'd probably not give up. My speed of learning new things with some LLM assist is boosted greatly, so from this perspective, one can say that LLM makes me better developer after all.
As other have have already said, this is highly context dependant but I want to bring some examples.
- It sometimes generate decent examples of linux kernel api usage for modules, which saves a lot of time digging in the limited documentation. But most of the time it will mix deprecated a new version of the api and it will very likely have very suboptimal and/buggy choices.
- for embedded C, it won't be able to work within the constraints of a very specific project (e.g. using obscure or custom hal), but for generic C or rtos applications it can generate between decent and half decent suggestions.
- almost never seen it generate decent verilog
Infinitely, because I wouldn't be bothered to make the things I make without AI.
I’m not sure you’re going to get any useful results from this question.
Some people find it useful, some people don’t, and unless what you’re using it for matches what they’re using it for (which you’re not asking) none of the results you get give you any insight into what you should expect for your use case.
Oh well, whatever. Here’s my $0.02; on a large code base that takes up to 30 minutes to do a local type check in typescript, the net benefit of AI is neutral or negative, because the agent can’t loop effectively and check it’s own results.
AI scaffolded results are largely irrelevant and don’t use internal design system components or tokens for UI and are generally useless.
Objectively measured ticket completion rates are not meaningfully impacted by the use of AI.
Out of date documentation leads agents to build incorrect solutions using outdated and depreciated techniques and services.
This is true across multiple tools and multiple models used, including sota.
1x
It is not more productive.
This reflects on my personal experience in the last 8 months of intense (and company mandated) AI usage at work.
At home, for small personal projects, I would say it’s closer to the 2x you describe, maybe as much as 3x for building and iterating on rich web UI using react.
So, in my professional work environment, which is a complex C++ app with long build times and lots of proprietary domain-specific knowledge, LLMs were worse than useless. Not totally surprising, but I originally had hopes for something like Cursor being useful in terms of simplifying the process of large mechanical refactors, which it decidedly wasn't. It could suggest interesting and complex uses of C++ templates, but its application were very slop-adjacent, and even for less complex refactors it would attempt and fall apart in the "check" part of the "guess and check" loop, probably because of the whole "long build times" thing. So we're still paying for visual assist and just wishing it had more (and more specific!) kinds of mechanical refactors available.
for personal projects (more polyglot but rust, js, python, and random shell scripts are bigger and more important here) it's been more mixed to positive; and this is (i think?) in part because i have the luxury of writing off things I'm _not actually_ interested in doing. maintaining cmake files sucks, and the free tier of Cursor does a good enough job of it. I have a few small plugins/extensions for things like blender, and again, I don't know enough to do a good job there, and the benefit of making something extremely specific to what i need without actually knowing what's going on under the hood works fine: I can just verify the results, and that's good enough. but then, conversely, it's made it _wayyyy_ harder to pick and verify third party libraries for the things i do care about? I'll look something up and it'll either be 100% AI vibe coded and not good enough to sneeze at, or it'll be fine, but the documentation is 100% AI generated and likewise, I would rather just have the version of this library before AI ever existed.
more and more I'm convinced LLM agents are only fit for purpose for things that don't need to be good or consistent, but that there is actually a viable niche of things that don't need to be good that it can nicely slot into. That's still not worth $20/month to me, though. and it's absolutely ruining the online commons in a way that makes it hard to feel good about.
(my understanding of claude code is that it's a non-interactive agent, which is worse for what i have in mind. iteration and _changing my mind_ are a big part of my process, so even if I let the computer do its own thing for an hour and work on something else, that's less productive than spending even 10 minutes of focused time on the same thing.)
>(my understanding of claude code is that it's a non-interactive agent, which is worse for what i have in mind. iteration and _changing my mind_ are a big part of my process, so even if I let the computer do its own thing for an hour and work on something else, that's less productive than spending even 10 minutes of focused time on the same thing.)
Just use 'plan mode'; it will ask clarifying questions.
At home, on my side projects it's made me 2x to 4x.
At work, also 2x to 4x.
The numbers would be even higher (4x to 8x) if I didn't spend half the time correcting slop and carefully steering the AI toward the desired solution when it gets sidetracked. But then again, I was also guilty of those things so maybe it's an even score?
Perhaps it's partly psychological in that using it forces me to think through the problems I'm trying to solve differently than before. Perhaps I'm just a mediocre dev and the AI is bringing me up to "slightly above average," but a win is a win and I'll take it.
I'm learning Python on the job, with decades of experience in other languages. Asking an LLM noob Python questions is probably 5-10x faster than googling around and reading docs or tutorials, and almost 100% accurate. I also ask it about git commands I seldom use (I tend to confuse reset and restore, though not revert for some reason).
Introductory questions about a widely used language with great documentation and tons of tutorials are made for LLMs.
That's what LLMs can help with.
With writing code for legacy code bases that don't have 3000 copies of the same tutorial in the first 3 pages of search results... they help less.
Note that most people claiming > 2x improvement are doing new code from scratch.
25%
0% productivity improvement.
Rephrasing the question: By what percentage has AI changed your input quality?
Answer would be around -50%. This is attributed mostly to the vast amount of search results that are AI generated and provide very low density information and miss conveying actual key learning points. This means you have to scan through 100% more text to finally get the information you need to solve the issue. I think this is a low estimate actually.
0% for me too, as on the few occasions I tried it, it gave completely useless responses that were so far off the mark, if I didn't know better it would've lead me on a completely wrong path.
Both faster and slower. The problem is you cannot trust LLMs with even the easiest of tasks. Generating boilerplate, better and more flexible than a code generator would, yes. Writing UIs that you then fix up, yes.
But business logic that needs to be trusted? Nooooo.
So very, very rarely 1000% faster. Most of the time 10% slower as I have to find and fix issues.
I'll still say it's a net improvement, but not by much. So let's say 10%.
1.3x when working on a large janky which codebase I am very familiar with, very unevenly distributed.
- Writing new code it's probably 3x or so[1].
- Writing automated tests for reproducible bugs, it's probably 2x or so.
- Fixing those bugs I try every so often but it still seems to be a net negative even for Opus 4.5, so call it 0.95x because I mostly just do it myself.
- Figuring out how to reproduce an undesired behavior that was observed in the wild in a controlled environment is still net negative - call it 0.8x because I keep being tempted by this siren song[2]
- Code review it's hard to say, I definitely am able to give _better_ reviews now than I was able to before, but I don't think I spend significantly less time on them. Call it 1.2x.
- Taking some high-level feature request and figuring which parts of the feature request already exist and are likely to work, which parts should be built, which parts we tried to build 5+ years ago and abandoned due to either issues with the implementation or issues with the idea that only became apparent after we observed actual users using it, and which parts are in tension with other parts of the system: net negative. 0.95x, just from trying again every so often.
- Writing new one-off utility tools for myself and my team: 10x-100x. LLMs are amazing. I can say "I want to see a Gantt chart style breakdown of when jobs in a gitlab pipeline start and finish each step of execution, here's the network log, here's a link to the gitlab api docs, write me a bookmarklet I can click on when I'm viewing a pipeline" and go get coffee and come back and have a bookmarklet[3].
Unfortunately for me, a significant fraction of my tasks are of the form "hey so this weird bug showed up in feature X, and the last employee to work on feature X left 6 years ago, can you figure out what's going on and fix it" or "we want to change Y functionality, what's the level of risk and effort".
-----
[1] This number would be higher, but pre-LLMs I invested quite a bit of effort into tooling to make repetitive boilerplate tasks faster, so that e.g. creating the skeleton of a unit or functional test for a module was 5 keystrokes. There's a large speedup in the tasks that are almost boilerplate, but not quite worth it for me to write my own tooling, counterbalanced by a significant slowdown if some but not all tasks had existing tooling that I have muscle memory for but the LLM agent doesn't.
[2] This feels like the sort of thing that the models should be good at. After all, if I fed in the observed behavior, the relevant logs, and the relevant files, even Sonnet 3.7 was capable of identifying the problem most of the time. The issue is that by the time I've figured out what happened at that level of detail, I usually already know what the issue was.
[3] Ok, it actually took a coffee break plus 3 rounds of debugging over about 30 minutes. Still, it's a very useful little tool and one I probably wouldn't have spent the time building in the before times.
With AI, I am a 10X developer now. At minimum. I use it for my C# and JavaScript day job. Copilot in VS.
I work on a code base that is easily over 1 million lines. It has had dozens of developers work on it over the last 15 to 20 years. Trying to follow the convention for the portion of the code base that I work on alone is a pain. I’ve been working on it for about seven years and I still had to ask questions.
So I would say that I work on a could base with a high level of drudgery. Having an all knowing AI companion has taken an awful lot of the stress out of every aspect.
Even the very best developer I’ve ever worked with can’t match me when I’m using AI to augment my work. For most development tasks, being the best of the best no longer matters. But in a strange way, you still need the exact same analytical mindset because now it’s all about prompts. And it definitely does not negate the need for a developer.
Writing your own code is essentially just an exercise in nostalgia at this point. Or someone who prefers to pick seeds out of cotton themselves, instead of using a cotton gin.
Or perhaps instead of using voice dictation to write this post, I would write a letter and mail it to Hacker News so that they can publish my comment to the site. That’s how backwards writing code is quickly becoming.
5-10% higher in the short term. Who knows in the long term?
I've implemented a few trivial things that I wouldn't have done before if I'd had to type them up by myself.
But the big things that actually take time? Collecting requirements, troubleshooting, planning, design, architecture, research, reading docs? If having basic literacy is 100, then I'd say having LLMs adds about 0.1, if that.
6-7%
I have yet to see a study design that sufficiently controls for all of the variables. In general it seems that if you could do the work on your own, it may not save time, and in the long run, most all output needs to be reviewed, perhaps moreso than if you were composing it yourself... And the additional variables include things like finding out of the box solutions either with or without the model outputs, which is hard to control for as well. Even more impossible to control variables are model quality, training material related to the topic, and many more of those classes of issues that may not be publically available or even possible to trace fully. Truly, anecdotes are not informative of the general experience. It is literally the cliche that is mentioned on this site periodically "these are the lotto numbers that worked for me."
Net negative. People keep using AI slop and are driven by shitty metrics from the top to use AI as much as possible. It leads to the entire codebase being garbage. I blame the higher ups. Everyone is just doing their job and has no room to negotiate here. Everything is tracked via metrics and you’re required to do a bunch of stupid AI slop shit otherwise you’re at risk of getting fired.
This is at a faang.
I am starting to use them less. I only use GLM now, the supposed "gains" from "better" models (like Gemini or Claude) are an illusion at best. They can waste you a lot of time as you swap through the different implementations, or as you debug bad structural decisions that it took and you didn't take the time to read/contemplate.
They can be useful to one-shot code that you already know exactly what it should be. Because if you already have the mental model of what the code should be, you can read it x10 faster. The other useful thing is: complex multi-dimensional search. You can explain a process and it'll walk you through the code base. Useful if you already have knowledge of the code base and you need to refresh your memory.
In general, now, I'd consider LLMs extremely harmful. They can introduce very high-interest debt to your code base and quickly bring it to collections. The usage must be carefully considered and deliberated.
The dotfile example pertinent, I can't be arsed to remember the syntax for every little thing that I don't interact with on a daily basis. Bash scripts, build tool chains, infrastructure code, all that tertiary stuff. While it isn't a huge number, the amount of time saved not having to google and relearn something every time I want to make a change adds up over time.
Of course this is undone by all the AI slop my boss passes by us now.
I think what most people answering in this thread who are saying something like "10x" are missing is that this number does probably not apply to their overall productivity. Sure, some boilerplate task that took you 10 min of writing code now maybe takes 1 min. But I think people just take this number and then extrapolate it to their whole work day.
I mean if you everybody would be truly 10x more productive (and at the same output quality as before, obviously), this would mean that your company is now developing software 10 times faster as 2 years ago. And there's just no way that this is true across our industry. So something is off.
Caveats:
- Yes, I get that you can now vibe code a pet project on the weekend when you were just too tired before, but that's not really the question here, is it?
- Yes, if your whole job is just to write boilerplate code, and there's no collaboration with others involved, no business logic involved, etc., ok, maybe you are 10x more productive.
300%
~10x with Claude code.