I wouldn't decode them like this, it's fragile, and global node IDs are supposed to be opaque in GraphQL.
I see that GitHub exposes a `databaseId` field on many of their types (like PullRequest) - is that what you're looking for?
Most GraphQL APIs that serve objects that implement the Node interface just base-64-encode the type name and the database ID, but I definitely wouldn't rely on that always being the case. You can read more about global IDs in GraphQL in the spec in [2].
> GitHub's migration guide tells developers to treat the new IDs as opaque strings and treat them as references. However it was clear that there was some underlying structure to these IDs as we just saw with the bitmasking
Great, so now GitHub can't change the structure of their IDs without breaking this person's code. The lesson is that if you're designing an API and want an ID to be opaque you have to literally encrypt it. I find it really demoralizing as an API designer that I have to treat my API's consumers as adversaries who will knowingly and intentionally ignore guidance in the documentation like this.
> Great, so now GitHub can't change the structure of their IDs without breaking this person's code.
And that is all the fault of the person who treated a documented opaque value as if it has some specific structure.
> The lesson is that if you're designing an API and want an ID to be opaque you have to literally encrypt it.
The lesson is that you should stop caring about breaking people’s code who go against the documentation this way. When it breaks you shrug. Their code was always buggy and it just happened to be working for them until then. You are not their dad. You are not responsible for their misfortune.
> I find it really demoralizing as an API designer that I have to treat my API's consumers as adversaries who will knowingly and intentionally ignore guidance in the documentation like this.
You could also say, if I tell you something is an opaque identifier, and you introspect it, it's your problem if your code breaks. I told you not to do that.
I think more important than worrying about people treating an opaque value as structured data, is wondering _why_ they're doing so. In the case of this blog post, all they wanted to do was construct a URL, which required the in teger database ID. Just make sure you expose what people need, so they don't need to go digging.
Other than that, I agree with what others are saying. If people rely on some undocumented aspect of your IDs, it's on them if that breaks.
Can GitHub change their API response rate? Can they increase it? If they do, they’ll break my code ‘cause it expects to receive responses at least after 1200ms. Any faster than that and I get race conditions. I selected the 1200ms number by measuring response rates.
No, you would call me a moron and tell me to go pound sand.
You could but you would lose the performance benefits you were seeking by encoding information into the ID. But you could also use a randomized, proprietary base64 alphabet rather than properly encrypting the ID.
XOR encryption is cheap and effective. Make the key the static string "IfYouCanReadThisYourCodeWillBreak" or something akin to that. That way, the key itself will serve as a final warning when (not if) the key gets cracked.
Symmetric encryption is computationally ~free, but most of them are conceptually complex. The purpose of encryption here isn't security, it's obfuscation in the service of dissuading people from depending on something they shouldn't, so using the absolutely simplest thing that could possibly work is a positive.
Yes, XOR is a real and fundamental primitive in cryptography, but a cryptographer may view the scheme you described as violating Kerckhoffs's second principle of "secrecy in key only" (sometimes phrased, "if you don't pass in a key, it is encoding and not encryption"). You could view your obscure phrase as a key, or you could view it as a constant in a proprietary, obscure algorithm (which would make it an encoding). There's room for interpretation there.
Note that this is not a one-time pad because it we are using the same key material many times.
But this is somewhat pedantic on my part, it's a distinction without a difference in this specific case where we don't actually need secrecy.
Encoding a type name into an ID is never really something I've viewed as being about performance. Think of it more like an area code, it's an essential part of the identifier that tells you how to interpret the rest of it.
> That repository ID (010:Repository2325298) had a clear structure: 010 is some type enum, followed by a colon, the word Repository, and then the database ID 2325298.
It's a classic length prefix. Repository has 10 chars, Tree has 4.
In database design typically it recommends giving out opaque natural keys, and keeping your monotonically increasing integer IDs secret and used internally.
Maybe. Until your natural key changes. Which happens. A lot.
Exposing a surrogate / generated key that is effectively meaningless seems to be wise. Maybe internally Youtube has an index number for all their videos, but they expose a reasonably meaningless coded value to their consumers.
This makes no sense. I am developing a product in this space (https://codeinput.com) and GitHub API and GraphQl is a badly entangled mess but you don’t trick your way around ids.
Same for urls, you are supposed to get them directly from GitHub not construct them yourself as format can change and then you find yourself playing a refactor cat-and-mouse game.
Best you can do is an hourly/daily cache for the values.
> Somewhere in GitHub's codebase, there's an if-statement checking when a repository was created to decide which ID format to return.
I doubt it. That's the beauty of GraphQL — each object can store its ID however it wants, and the GraphQL layer encodes it in base64. Then when someone sends a request with a base64-encoded ID, there _might_ be an if-statement (or maybe it just does a lookup on the ID). If anything, the if-statement happens _after_ decoding the ID, not before encoding it.
There was never any if-statement that checked the time — before the migration, IDs were created only in the old format. After the migration, they were created in the new format.
I wouldn't decode them like this, it's fragile, and global node IDs are supposed to be opaque in GraphQL.
I see that GitHub exposes a `databaseId` field on many of their types (like PullRequest) - is that what you're looking for?
Most GraphQL APIs that serve objects that implement the Node interface just base-64-encode the type name and the database ID, but I definitely wouldn't rely on that always being the case. You can read more about global IDs in GraphQL in the spec in [2].
[1] https://docs.github.com/en/graphql/reference/objects#pullreq... [2] https://graphql.org/learn/global-object-identification/
> GitHub's migration guide tells developers to treat the new IDs as opaque strings and treat them as references. However it was clear that there was some underlying structure to these IDs as we just saw with the bitmasking
Great, so now GitHub can't change the structure of their IDs without breaking this person's code. The lesson is that if you're designing an API and want an ID to be opaque you have to literally encrypt it. I find it really demoralizing as an API designer that I have to treat my API's consumers as adversaries who will knowingly and intentionally ignore guidance in the documentation like this.
> Great, so now GitHub can't change the structure of their IDs without breaking this person's code.
And that is all the fault of the person who treated a documented opaque value as if it has some specific structure.
> The lesson is that if you're designing an API and want an ID to be opaque you have to literally encrypt it.
The lesson is that you should stop caring about breaking people’s code who go against the documentation this way. When it breaks you shrug. Their code was always buggy and it just happened to be working for them until then. You are not their dad. You are not responsible for their misfortune.
> I find it really demoralizing as an API designer that I have to treat my API's consumers as adversaries who will knowingly and intentionally ignore guidance in the documentation like this.
You don’t have to.
You could also say, if I tell you something is an opaque identifier, and you introspect it, it's your problem if your code breaks. I told you not to do that.
Once "you" becomes a big enough "them" it becomes a problem again.
I think more important than worrying about people treating an opaque value as structured data, is wondering _why_ they're doing so. In the case of this blog post, all they wanted to do was construct a URL, which required the in teger database ID. Just make sure you expose what people need, so they don't need to go digging.
Other than that, I agree with what others are saying. If people rely on some undocumented aspect of your IDs, it's on them if that breaks.
Can GitHub change their API response rate? Can they increase it? If they do, they’ll break my code ‘cause it expects to receive responses at least after 1200ms. Any faster than that and I get race conditions. I selected the 1200ms number by measuring response rates.
No, you would call me a moron and tell me to go pound sand.
Weird systems were never supported to begin with.
This is well understood - Hyrum's law.
You don't need encryption, a global_id database column with a randomly generated ID will do.
You could but you would lose the performance benefits you were seeking by encoding information into the ID. But you could also use a randomized, proprietary base64 alphabet rather than properly encrypting the ID.
XOR encryption is cheap and effective. Make the key the static string "IfYouCanReadThisYourCodeWillBreak" or something akin to that. That way, the key itself will serve as a final warning when (not if) the key gets cracked.
Any symmetric encryption is ~free compared to the cost of a network request or db query.
In this particular instance, Speck would be ideal since it supports a 96-bit block size https://en.wikipedia.org/wiki/Speck_(cipher)
Symmetric encryption is computationally ~free, but most of them are conceptually complex. The purpose of encryption here isn't security, it's obfuscation in the service of dissuading people from depending on something they shouldn't, so using the absolutely simplest thing that could possibly work is a positive.
A cryptographer may quibble and call that an encoding but I agree.
A cryptographer would say that XOR ciphers are a fundamental cryptography primitive, and e.g. the basic building blocks for one-time pads.
Yes, XOR is a real and fundamental primitive in cryptography, but a cryptographer may view the scheme you described as violating Kerckhoffs's second principle of "secrecy in key only" (sometimes phrased, "if you don't pass in a key, it is encoding and not encryption"). You could view your obscure phrase as a key, or you could view it as a constant in a proprietary, obscure algorithm (which would make it an encoding). There's room for interpretation there.
Note that this is not a one-time pad because it we are using the same key material many times.
But this is somewhat pedantic on my part, it's a distinction without a difference in this specific case where we don't actually need secrecy.
Encoding a type name into an ID is never really something I've viewed as being about performance. Think of it more like an area code, it's an essential part of the identifier that tells you how to interpret the rest of it.
That's fair, and you could definitely put a prefix and a UUID (or whatever), I failed to consider that.
Hyrum's law is a real sonuvabitch.
The API contract doesn’t stipulate the behavior so GitHub is free to change as they please.
> That repository ID (010:Repository2325298) had a clear structure: 010 is some type enum, followed by a colon, the word Repository, and then the database ID 2325298.
It's a classic length prefix. Repository has 10 chars, Tree has 4.
I just want to point out that Opus 4.5 actually knows this trick and will write the code to decode the IDs if it is working with GitHub's API lol
In database design typically it recommends giving out opaque natural keys, and keeping your monotonically increasing integer IDs secret and used internally.
Maybe. Until your natural key changes. Which happens. A lot.
Exposing a surrogate / generated key that is effectively meaningless seems to be wise. Maybe internally Youtube has an index number for all their videos, but they expose a reasonably meaningless coded value to their consumers.
This makes no sense. I am developing a product in this space (https://codeinput.com) and GitHub API and GraphQl is a badly entangled mess but you don’t trick your way around ids.
There is actually a documented way to do it: https://docs.github.com/en/graphql/guides/using-global-node-...
Same for urls, you are supposed to get them directly from GitHub not construct them yourself as format can change and then you find yourself playing a refactor cat-and-mouse game.
Best you can do is an hourly/daily cache for the values.
> Somewhere in GitHub's codebase, there's an if-statement checking when a repository was created to decide which ID format to return.
I doubt it. That's the beauty of GraphQL — each object can store its ID however it wants, and the GraphQL layer encodes it in base64. Then when someone sends a request with a base64-encoded ID, there _might_ be an if-statement (or maybe it just does a lookup on the ID). If anything, the if-statement happens _after_ decoding the ID, not before encoding it.
There was never any if-statement that checked the time — before the migration, IDs were created only in the old format. After the migration, they were created in the new format.