Looking at the rfc, I'm not sure I understand the motivation, as it suggests multiple times that a client or intermediary will have to read external documentation:
> Servers MAY choose to return partition keys that distinguish between quota allocated to different consumers or different resources. There are a wide range of strategies for partitioning server capacity, including per user, per application, per HTTP method, per resource, or some combination of those values. The server SHOULD document how the partition key is generated so that clients can predict the key value for a future request and determine if there is sufficient quota remaining to execute the request.
If external documentation is required, why send the header? It seems as though having it in the documentation is generally preferable, rather than something to avoid.
The server would be telling the client the rate-limiting values active/effective for to it. As such, the client doesn't actually need to know what "its partition" is. As far as the client is concerned, "its partition" is the whole of the rate-limiting domain.
The partitioning strategy, and partition chosen using it, would never — should never — be relevant to any automated logic inside the client. (The only way in which it could be would be if you were trying to make a client that aims to defeat the server's rate-limiting logic by using multiple accounts or IP addresses to jump between partitions, and that's... not okay.)
The point of sending the partitioning info to the client, is that it enables a human developing a client, or operating a tool that embeds a client, to debug why rate-limiting is happening when by their understanding it shouldn't be — especially when they have multiple clients across multiple threads / machines each making multiple concurrent requests to the API. These HTTP-429-response heisenbugs get much easier to reason about when the server is sending the client enough information for the developer to be able to see which of the requests they sent got rate-limiting-bucketed together, and which didn't.
It's true that if an API requires the devs of its consumers to have consulted documentation in order to respect the RateLimit header, they can just as easily include custom API logic for traffic control, but this does provide a nice standardized way to do so nevertheless.
And since the word is "MAY", APIs may also use standard responses that don't require an custom handling. As an example a CLI-builder library which parses OpenAPI spec can adopt changes to handle the RateLimit header automatically, in the situations where consulting docs is not required.
Maintainer on the Ky library team here, a popular HTTP client for JavaScript.
We support these headers, but unfortunately there’s a mess of different implementations out there. The names aren’t consistent. The number/date formats aren’t consistent. We occasionally discover new edge cases. The standard is very late to the party. Of course, better late than never. I just hope it can actually gain traction given the inertia of some incompatible implementations.
If you are designing an API, I strongly recommend using `Retry-After` for as long as you can get away with it and only implementing the rate limit headers when it really becomes necessary. Good clients will add jitter and exponential backoff to prevent the thundering herd problem.
It is nice to see some actual progress on this because handling rate limits has always been kind of a mess. I really hope the major gateways pick this up quickly so we do not have to write custom logic for every integration.
It really irks me that the de facto rate limiting headers mix camel case with the more standard dashes, i.e. RateLimit-Remaining instead of Rate-Limit-Remaining.
Looking at the rfc, I'm not sure I understand the motivation, as it suggests multiple times that a client or intermediary will have to read external documentation:
> Servers MAY choose to return partition keys that distinguish between quota allocated to different consumers or different resources. There are a wide range of strategies for partitioning server capacity, including per user, per application, per HTTP method, per resource, or some combination of those values. The server SHOULD document how the partition key is generated so that clients can predict the key value for a future request and determine if there is sufficient quota remaining to execute the request.
If external documentation is required, why send the header? It seems as though having it in the documentation is generally preferable, rather than something to avoid.
The server would be telling the client the rate-limiting values active/effective for to it. As such, the client doesn't actually need to know what "its partition" is. As far as the client is concerned, "its partition" is the whole of the rate-limiting domain.
The partitioning strategy, and partition chosen using it, would never — should never — be relevant to any automated logic inside the client. (The only way in which it could be would be if you were trying to make a client that aims to defeat the server's rate-limiting logic by using multiple accounts or IP addresses to jump between partitions, and that's... not okay.)
The point of sending the partitioning info to the client, is that it enables a human developing a client, or operating a tool that embeds a client, to debug why rate-limiting is happening when by their understanding it shouldn't be — especially when they have multiple clients across multiple threads / machines each making multiple concurrent requests to the API. These HTTP-429-response heisenbugs get much easier to reason about when the server is sending the client enough information for the developer to be able to see which of the requests they sent got rate-limiting-bucketed together, and which didn't.
The relevant word here is MAY[1]
It's true that if an API requires the devs of its consumers to have consulted documentation in order to respect the RateLimit header, they can just as easily include custom API logic for traffic control, but this does provide a nice standardized way to do so nevertheless.
And since the word is "MAY", APIs may also use standard responses that don't require an custom handling. As an example a CLI-builder library which parses OpenAPI spec can adopt changes to handle the RateLimit header automatically, in the situations where consulting docs is not required.
[1] https://datatracker.ietf.org/doc/html/rfc2119
Maintainer on the Ky library team here, a popular HTTP client for JavaScript.
We support these headers, but unfortunately there’s a mess of different implementations out there. The names aren’t consistent. The number/date formats aren’t consistent. We occasionally discover new edge cases. The standard is very late to the party. Of course, better late than never. I just hope it can actually gain traction given the inertia of some incompatible implementations.
If you are designing an API, I strongly recommend using `Retry-After` for as long as you can get away with it and only implementing the rate limit headers when it really becomes necessary. Good clients will add jitter and exponential backoff to prevent the thundering herd problem.
Yup, seems both overengineered and undercooked both at the same time, as is unfortunately common for newer headers.
As you said, 429 + Retry-After is plenty good already.
It is nice to see some actual progress on this because handling rate limits has always been kind of a mess. I really hope the major gateways pick this up quickly so we do not have to write custom logic for every integration.
It really irks me that the de facto rate limiting headers mix camel case with the more standard dashes, i.e. RateLimit-Remaining instead of Rate-Limit-Remaining.
At least it's not misspelled.
https://en.wikipedia.org/wiki/HTTP_referer
it's all lowercase anyway at parse time.
rate-limit-remaining would be nicer than ratelimit-remaining