I work at Ramp and have always been on the “luddite” side of AI code tools. I use them but usually I’m not that impressed and a curmudgeon when I see folks ask Claude to debug something instead of just reading the code. I’m just an old(er) neckbeard at heart.
But. This tool is scarily good. I’m seeing it “1-shot” features in a fairly sizable code base and fixes with better code and accuracy than me.
An important point here is that it isnt doing a 1-shot implementation, it is iteratively solving a problem over multiple iterations, with a closed feedback loop.
Create the right agentic feedback loop and a reasoning model can perform far better through iteration than its first 1-shot attempt.
This is very human. How much code can you reliable write without any feedback? Very little. We iterate, guided by feedback (compiler, linter, executing and exploring)
We use https://devin.ai for this and it works very well. Devin has it's own virtual environment, IDE, terminal and browser. You can configure it to run your application and connect to whatever it needs. Devin can modify the app, test changes in the browser and send you a screen recording of the working feature with a PR.
This is a really great post - and what they've built here is very impressive.
I wonder if we're at the point where the cost of building and maintaining this yourselves (assisted with an AI Copilot) is now more effective than an off-the-shelf?
It feels like there's a LOT of moving parts here, but also it's deeply tailored to their own setup.
FWIW - I tried pointing Claude at the post and asking it to design an implementation, (like the post said to do) and it struggled - but perhaps I prompted it wrong.
This is a great writeup! Could you share more about the sandbox <-> client communication architecture? e.g., is the agent emitting events to a queue/topic, writing artifacts to object storage, and the client subscribes; or is it more direct (websocket/gRPC) from the sandbox? I’ve mostly leaned on sandbox.exec() patterns in Modal, and I’m curious what you found works best at scale.
I guess we all know and „love“ how every five minutes, some breathless hipster influencer posts „This changes everything!!!“ to every new x.y.1 AI bubble increment.
But honestly? This here really is something.
I can vividly imagine how in a not too far future, there will only be two types of product companies: those that work like this, and those that don’t — and vanish.
Edit: To provide a less breathless take myself:
What I can very realistically imagine is that just like today sane and level-headed startups go „let’s first set up some decent infrastructure-as-code, a continuous delivery pipeline, and a solid testing framework, and then start building the product for good“, in the future sane and level-headed startups will go „let’s first set up some decent infrastructure-as-code, a continuous delivery pipeline, a solid testing framework, and a Ramp-style background agent — and then start building the product for good“.
And here I am trying to get 1 terminal agent to control 4-5 other terminal agents
I work at Ramp and have always been on the “luddite” side of AI code tools. I use them but usually I’m not that impressed and a curmudgeon when I see folks ask Claude to debug something instead of just reading the code. I’m just an old(er) neckbeard at heart.
But. This tool is scarily good. I’m seeing it “1-shot” features in a fairly sizable code base and fixes with better code and accuracy than me.
An important point here is that it isnt doing a 1-shot implementation, it is iteratively solving a problem over multiple iterations, with a closed feedback loop.
Create the right agentic feedback loop and a reasoning model can perform far better through iteration than its first 1-shot attempt.
This is very human. How much code can you reliable write without any feedback? Very little. We iterate, guided by feedback (compiler, linter, executing and exploring)
This basically sums up where we're at. Undeniably useful but careful in approach.
We use https://devin.ai for this and it works very well. Devin has it's own virtual environment, IDE, terminal and browser. You can configure it to run your application and connect to whatever it needs. Devin can modify the app, test changes in the browser and send you a screen recording of the working feature with a PR.
Interestingly, Devin lists Ramp (the OP) as a customer on their front page.
Surprised they need both.
This is a really great post - and what they've built here is very impressive.
I wonder if we're at the point where the cost of building and maintaining this yourselves (assisted with an AI Copilot) is now more effective than an off-the-shelf?
It feels like there's a LOT of moving parts here, but also it's deeply tailored to their own setup.
FWIW - I tried pointing Claude at the post and asking it to design an implementation, (like the post said to do) and it struggled - but perhaps I prompted it wrong.
This is a great writeup! Could you share more about the sandbox <-> client communication architecture? e.g., is the agent emitting events to a queue/topic, writing artifacts to object storage, and the client subscribes; or is it more direct (websocket/gRPC) from the sandbox? I’ve mostly leaned on sandbox.exec() patterns in Modal, and I’m curious what you found works best at scale.
Probably the best internal ai platform I've seen to date, incredible work.
The commitment to reducing friction is really incredible. Are they implying that any developer could recreate the system with AI from the description?
the chrome extension bit is super interesting and well thought out
i wonder what percentage of PRs etc is now from non eng?
I guess we all know and „love“ how every five minutes, some breathless hipster influencer posts „This changes everything!!!“ to every new x.y.1 AI bubble increment.
But honestly? This here really is something.
I can vividly imagine how in a not too far future, there will only be two types of product companies: those that work like this, and those that don’t — and vanish.
Edit: To provide a less breathless take myself:
What I can very realistically imagine is that just like today sane and level-headed startups go „let’s first set up some decent infrastructure-as-code, a continuous delivery pipeline, and a solid testing framework, and then start building the product for good“, in the future sane and level-headed startups will go „let’s first set up some decent infrastructure-as-code, a continuous delivery pipeline, a solid testing framework, and a Ramp-style background agent — and then start building the product for good“.