> In March 2026, I migrated to self-hosted object storage powered by Versity S3 Gateway.
Thanks for sharing this, I wasn't even aware of Versity S3 from my searches and discussions here. I recently migrated my projects from MinIO to Garage, but this seems like another viable option to consider.
For this project, where you have 120GB of customer data, and thirty requests a second for ~8k objects (0.25MB/s object reads), you’d seem to be able to 100x the throughput vertically scaling on one machine with a file system and an SSD and never thinking about object storage. Would love to see why the complexity
(Author here) that's more or less what I have right now – one machine with a file system and an SSD. S3 API on top is there to give multiple web servers shared access to the same storage. I could have used something else instead of S3 – say, NFS – but there was a feature request for S3 [1] and S3 has a big ecosystem around it already.
I don't get it, if it's running on the same (mentioning "local") machine, why does it even need the S3 API? Could just be plain IO on the local drive(s)
The app was already built against the S3 API when it used cloud storage. Keeping that interface means the code doesn't change - you just point it at a local S3-compatible gateway instead of AWS/DO. Makes it trivial to switch back or move providers if needed.
If the app was written using the S3 API, it would be much faster/cheaper to migrate to a local system the provides the same API. Switching to local IO would mean (probably) rewriting a lot of code.
Surely "read object" and "write object" are not hard to migrate to local file system. You can also use Apache OpenDAL which provide the same interface to both.
Apart from all these other products that implement s3? MinIO, Ceph (RGW), Garage, SeaweedFS, Zenko CloudServer, OpenIO, LakeFS, Versity, Storj, Riak CS,
JuiceFS, Rustfs, s3proxy.
Riak CS been dead for over a decade which makes me question the rest. Some of these also do not have the same behaviors when it comes to paths (MinIO is one of those IIRC).
What kind of vendor lock-in do you even talk about. Their API is public knowledge, AWS publishes the spec, there are multiple open source reference client implementations available on GitHub, there are multiple alternatives supporting the protocol, you can find writings from AWS people as high in hierarchy as Werner Vogels about internals. Maybe you could say that some s3 features with no alternative implementation in alternative products are a lock-in. I would consider it a „competitive advantage”. YMMV.
> part of it is just to lock people into AWS once they start working with it.
This is some next-level conspiracy theory stuff. What exactly would the alternative have been in 2006? S3 is one of the most commonly implemented object storage APIs around, so if the goal is lock-in, they're really bad at it.
> What exactly would the alternative have been in 2006?
Well, WebDAV (Document Authoring and Versioning) had been around for 8 years when AWS decided they needed a custom API. And what service provider wasn't trying to lock you into a service by providing a custom API (especially pre-GPT) when one existed already? Assuming they made the choice for a business benefit doesn't require anything close to a conspiracy theory.
And it worked as a moat until other companies and open source projects started cloning the API. See also: Microsoft.
WebDAV is kinda bad, and back then it was a big deal that corporate proxies wouldn't forward custom HTTP methods. You could barely trust PUT to work, let alone PROPFIND.
seperate machine I think given the quoted point at the end:
> The costs have increased: renting an additional dedicated server costs more than storing ~100GB at a managed object storage service. But the improved performance and reliability are worth it.
yeah, sure, those 5-10 different API calls would surely be a huge toll to refactor... I'd rather run an additional service to reimplement the S3 API mapping to my local drive /s
Same here. Had a production node running btrfs under heavy write load (lots of small files, frequent creates) and spent two days debugging what turned out to be filesystem-level corruption. Switched to ext4 and never looked back. The article doesn't mention what filesystem sits under Versitygw here, which seems like a pretty relevant omission for anyone thinking of replicating the setup.
I'd worry about file create, write, then fsync performance with btrfs, but not about reliability or data-loss.
But a quick grep across versitygw tells me they don't use Sync()/fsync, so not a problem... Any data loss occurring from that is obviously not btrfs fault.
> In March 2026, I migrated to self-hosted object storage powered by Versity S3 Gateway.
Thanks for sharing this, I wasn't even aware of Versity S3 from my searches and discussions here. I recently migrated my projects from MinIO to Garage, but this seems like another viable option to consider.
Given the individual file size and total volume, I'd argue it make sense to use move to local only storage.
On a separate note, what tool is the final benchmark screenshot form?
Self Hosted object storage looks neat!
For this project, where you have 120GB of customer data, and thirty requests a second for ~8k objects (0.25MB/s object reads), you’d seem to be able to 100x the throughput vertically scaling on one machine with a file system and an SSD and never thinking about object storage. Would love to see why the complexity
(Author here) that's more or less what I have right now – one machine with a file system and an SSD. S3 API on top is there to give multiple web servers shared access to the same storage. I could have used something else instead of S3 – say, NFS – but there was a feature request for S3 [1] and S3 has a big ecosystem around it already.
[1] https://github.com/healthchecks/healthchecks/issues/609
The complexity for that is almost always for redundancy and for ease of deploys.
I don't get it, if it's running on the same (mentioning "local") machine, why does it even need the S3 API? Could just be plain IO on the local drive(s)
The app was already built against the S3 API when it used cloud storage. Keeping that interface means the code doesn't change - you just point it at a local S3-compatible gateway instead of AWS/DO. Makes it trivial to switch back or move providers if needed.
(Author here) There are multiple web servers for redundancy (3 currently), and each needs access to all objects.
with average object size of 8.5kB I'd honestly consider storing it as blobs in cloud DB, with maybe some small per-server cache in front
If the app was written using the S3 API, it would be much faster/cheaper to migrate to a local system the provides the same API. Switching to local IO would mean (probably) rewriting a lot of code.
Surely "read object" and "write object" are not hard to migrate to local file system. You can also use Apache OpenDAL which provide the same interface to both.
Or a simple SAN
The S3 API doesn't work like normal filesystem APIs.
Part of it is that it follows the object storage model, and part of it is just to lock people into AWS once they start working with it.
I'm 100% aware of how S3 works. I was questioning why the S3 API is needed when the service is using local storage.
Sometimes API compatibility is an important detail.
I've worked at a few places where single-node K8s "clusters" were frequently used just because they wanted the same API everywhere.
Apart from all these other products that implement s3? MinIO, Ceph (RGW), Garage, SeaweedFS, Zenko CloudServer, OpenIO, LakeFS, Versity, Storj, Riak CS, JuiceFS, Rustfs, s3proxy.
Riak CS been dead for over a decade which makes me question the rest. Some of these also do not have the same behaviors when it comes to paths (MinIO is one of those IIRC).
What kind of vendor lock-in do you even talk about. Their API is public knowledge, AWS publishes the spec, there are multiple open source reference client implementations available on GitHub, there are multiple alternatives supporting the protocol, you can find writings from AWS people as high in hierarchy as Werner Vogels about internals. Maybe you could say that some s3 features with no alternative implementation in alternative products are a lock-in. I would consider it a „competitive advantage”. YMMV.
> part of it is just to lock people into AWS once they start working with it.
This is some next-level conspiracy theory stuff. What exactly would the alternative have been in 2006? S3 is one of the most commonly implemented object storage APIs around, so if the goal is lock-in, they're really bad at it.
> What exactly would the alternative have been in 2006?
Well, WebDAV (Document Authoring and Versioning) had been around for 8 years when AWS decided they needed a custom API. And what service provider wasn't trying to lock you into a service by providing a custom API (especially pre-GPT) when one existed already? Assuming they made the choice for a business benefit doesn't require anything close to a conspiracy theory.
And it worked as a moat until other companies and open source projects started cloning the API. See also: Microsoft.
WebDAV is kinda bad, and back then it was a big deal that corporate proxies wouldn't forward custom HTTP methods. You could barely trust PUT to work, let alone PROPFIND.
WebDAV is ass tho. I don't remember a single positive experience with anything using it.
And still need redundant backend giving it as API
seperate machine I think given the quoted point at the end:
> The costs have increased: renting an additional dedicated server costs more than storing ~100GB at a managed object storage service. But the improved performance and reliability are worth it.
So you don't need to refactor your code?
And when/if you decide to head back to a 3rd party it requires no refactoring again.
yeah, sure, those 5-10 different API calls would surely be a huge toll to refactor... I'd rather run an additional service to reimplement the S3 API mapping to my local drive /s
I'm sure it's a lot better now but everytime I see btrfs I get PTSD.
I hit a panic in btrfs using an ubuntu 24 LTS kernel. The trauma is still well and alive.
Same here. Had a production node running btrfs under heavy write load (lots of small files, frequent creates) and spent two days debugging what turned out to be filesystem-level corruption. Switched to ext4 and never looked back. The article doesn't mention what filesystem sits under Versitygw here, which seems like a pretty relevant omission for anyone thinking of replicating the setup.
I'd worry about file create, write, then fsync performance with btrfs, but not about reliability or data-loss.
But a quick grep across versitygw tells me they don't use Sync()/fsync, so not a problem... Any data loss occurring from that is obviously not btrfs fault.
Care to elaborate? I've heard good things about it, but am personally a ZFS user.
Years of serious corruption bugs.
Gluster was that for me
Ah, another one! Yep, also same, before ceph days at least (although I've had my own, albeit self-inflicted, nightmare there too).
Yup, still get nightmares about glusterfs.... still have one customer running on it.
I heard it got better, but we ran into the BOTF (billions of tiny files) issue around 2016. (For a genealogy startup this was a serious issue)