I wrote about this because it hits two of my current obsessions at once - developer environment sandboxes (for safely running Claude Code etc in YOLO mode) and APIs for executing untrusted code.
That seems fine if you have a box that's running all the time and something like tailscale set up. I haven't bothered because I'm lazy, but I do want any coding agents I have off my laptop and off the local network, because I'm a little wary about them getting subverted. They need Internet access anyway, so might as well.
Since I anticipate using coding agents a lot, that means my dev environment is going to live in a VM in the cloud from now on.
The big issue with this is though that GPUs are not available. I believe many people have local workstation boxes where they do dev on and need proper sandboxes and stuff like firecracker is also not super useful as afaik GPU passthrough is not ideal/working. Or any kinds of larger HW requirements.
A workaround might be to copy a directory out with something like Syncthing so you can test locally. But then the coding agent can't help you. So yeah, I can see setting up a box for that. I'm doing web development so it's not an issue for me.
a local sandbox may not be perfectly isolated, unless you’re running it in a VM. But then that takes up local resources. or you’re on the go a lot. a person might not have a reliable local machine or network or be in a position to keep it on and consistent all the time.
>> In a smart piece of design, Sprites uses pre-installed skills to teach Claude how Sprites itself works. This means you can ask Claude on the machine how to do things like open up ports and it will talk you through the process.
^^ So is Claude Code baked into a default sprite? If so, how/who/what API key is paying for CC? (I'm assuming this gets configured some way? Perhaps in just the normal CC CLI way?)
as a guy who is not in loop with all these sandbox developments, I apologize for this extremely stupid question. Why do we need any of these sandboxes? Why cant we use docker? I thought it was a solved problem 10 yrs ago?
See: A field guide to sandboxes for AI¹ on the threat models.
> I want to be direct: containers are not a sufficient security boundary for hostile code. They can be hardened, and that matters. But they still share the host kernel. The failure modes I see most often are misconfiguration and kernel/runtime bugs — plus a third one that shows up in AI systems: policy leakage.
I think one difference is that it also provides the service of being a production environment you can serve from at the same time as development. There's more information about this thought in the fly io blog post.
I just use Docker devcontainers using Anthropic's own Dockerfile as a base, and it gives me a persistent sandbox that have ports opened and work in any container environment (be it remote or local), and work in any IDE that supports devcontainers...
So what if Claude Code makes a mistake and tears up the sandbox? What happens to all the persisted state (aside from the container image)?
The linked fly.io article discusses why containers aren't a good fit for sandboxes that need persistent state and how sprites.dev addresses the challenges.
I read the linked fly article and didn't see where it's mentioned why containers aren't a good fit for sandboxes that need persistent state. You can definitely do all the same snapshoting directly on your local docker volumes, although granted you'd need zfs or lvm backed volumes (which is probably what sprites do under the hood).
I think there are tradeoffs here. Maybe your one person vibe coded app doesn't need any change management, IaC, any of that. No docker file, start with whatever docker file fly wrote for you, beat it with an agent until it works enough. And it's pretty cool that you can then just serve it directly. Is it dev or prod? Yes.
On the other hand, I really don't think editing php files over ftp in prod was ahead of it's time -- I was there, man, and it sucked. I just know I'll be really confused about why something doesn't work eventually and wish I had some tracking of what changed over time. I want my IDE. I want VCS!
Maybe it's concerns about docker chroot escape? I'm not sure what the current consensus is on how "secure" docker is, but in the past I've heard you shouldn't assume an app in a container is fully isolated from the "host" system.
Here's the announcement on the Fly blog: https://fly.io/blog/code-and-let-live/ - and the Hacker News thread for that post: https://news.ycombinator.com/item?id=46557825
I wrote about this because it hits two of my current obsessions at once - developer environment sandboxes (for safely running Claude Code etc in YOLO mode) and APIs for executing untrusted code.
Stupid question but why not use a local sandbox for yolo mode instead of a remote machine.
Is there a similar service that runs locally?
That seems fine if you have a box that's running all the time and something like tailscale set up. I haven't bothered because I'm lazy, but I do want any coding agents I have off my laptop and off the local network, because I'm a little wary about them getting subverted. They need Internet access anyway, so might as well.
Since I anticipate using coding agents a lot, that means my dev environment is going to live in a VM in the cloud from now on.
The big issue with this is though that GPUs are not available. I believe many people have local workstation boxes where they do dev on and need proper sandboxes and stuff like firecracker is also not super useful as afaik GPU passthrough is not ideal/working. Or any kinds of larger HW requirements.
A workaround might be to copy a directory out with something like Syncthing so you can test locally. But then the coding agent can't help you. So yeah, I can see setting up a box for that. I'm doing web development so it's not an issue for me.
a local sandbox may not be perfectly isolated, unless you’re running it in a VM. But then that takes up local resources. or you’re on the go a lot. a person might not have a reliable local machine or network or be in a position to keep it on and consistent all the time.
literally sandboxes in the cloud. sand castles…
>> In a smart piece of design, Sprites uses pre-installed skills to teach Claude how Sprites itself works. This means you can ask Claude on the machine how to do things like open up ports and it will talk you through the process.
^^ So is Claude Code baked into a default sprite? If so, how/who/what API key is paying for CC? (I'm assuming this gets configured some way? Perhaps in just the normal CC CLI way?)
Yes it's baked in. You as the user pay for a separate Anthropic account and login with that when you first use a sprite.
as a guy who is not in loop with all these sandbox developments, I apologize for this extremely stupid question. Why do we need any of these sandboxes? Why cant we use docker? I thought it was a solved problem 10 yrs ago?
See: A field guide to sandboxes for AI¹ on the threat models.
> I want to be direct: containers are not a sufficient security boundary for hostile code. They can be hardened, and that matters. But they still share the host kernel. The failure modes I see most often are misconfiguration and kernel/runtime bugs — plus a third one that shows up in AI systems: policy leakage.
¹ https://www.luiscardoso.dev/blog/sandboxes-for-ai
I think one difference is that it also provides the service of being a production environment you can serve from at the same time as development. There's more information about this thought in the fly io blog post.
I just use Docker devcontainers using Anthropic's own Dockerfile as a base, and it gives me a persistent sandbox that have ports opened and work in any container environment (be it remote or local), and work in any IDE that supports devcontainers...
https://anil.recoil.org/notes/ocaml-claude-dev
So what if Claude Code makes a mistake and tears up the sandbox? What happens to all the persisted state (aside from the container image)?
The linked fly.io article discusses why containers aren't a good fit for sandboxes that need persistent state and how sprites.dev addresses the challenges.
I read the linked fly article and didn't see where it's mentioned why containers aren't a good fit for sandboxes that need persistent state. You can definitely do all the same snapshoting directly on your local docker volumes, although granted you'd need zfs or lvm backed volumes (which is probably what sprites do under the hood).
I think there are tradeoffs here. Maybe your one person vibe coded app doesn't need any change management, IaC, any of that. No docker file, start with whatever docker file fly wrote for you, beat it with an agent until it works enough. And it's pretty cool that you can then just serve it directly. Is it dev or prod? Yes.
On the other hand, I really don't think editing php files over ftp in prod was ahead of it's time -- I was there, man, and it sucked. I just know I'll be really confused about why something doesn't work eventually and wish I had some tracking of what changed over time. I want my IDE. I want VCS!
Maybe it's concerns about docker chroot escape? I'm not sure what the current consensus is on how "secure" docker is, but in the past I've heard you shouldn't assume an app in a container is fully isolated from the "host" system.