Could you explain what a git trailer is if not appended to the message body? My understanding is that trailers are just key-value pairs in a particular format at the end of the message; there's not an alternative storage mechanism.
Even so, trailers or message body might be moot - rerolling the committed at timestamp should be sufficient!
Even with git-prime reducing the address space by a few orders of magnitude, there's still (effectively) zero chance for collision. The difference between 10^-29 and 10^-27 isn't that great in practice.
Actually there are π(N) ~ N / ln(N) primes less than N per the Prime Number Theorem, so π(2 ^ 160) ~ 2 ^ 153.2 - this only drops 7 bits. So that does increase the odds of collision but much less than what I expected!
I added the trailer syntax, and rewrote git-prime history to ensure all commits are now number theoretic certified.
If you wish to do the same in your own repo I added a script "make-whole.sh" to do this - but I don't recommend it as force pushes and history rewrites could break stuff.
Tangentially, I love how easy it is to add submodules to git. Just put an executable named git-<something> in your $PATH and it will get called by git when invoked like that.
Whenever you amend a commit, the commit time stamp changes; that ought to be enough, so that the nonce is not required. However, I think it has only second precision, so if you stick to honest wall time, it means 100 attempts require 100 seconds.
Wow, that's fast. Do you know if committer's email also goes into calculation of a git hash? If it is, it should be possible to manipulate git hash in a very discrete way through the email address like this: user+<nonce>@example.com
Why? Fun. Now every commit is a certified 160-bit prime number.
- Miller-Rabin primality test (40 rounds, ~10^-24 false positive rate)
- Fuzzes commit messages with nonces until finding a prime hash
- Average ~368 attempts to find a prime (based on prime density at 2^160)
- Actual performance: 30-120 seconds depending on luck
The philosophy: shouldn't the global distributed compute grid be used to forward number theoretic random non-goals like primality?
Every developer running git-prime contributes cycles to finding 160-bit primes hidden in SHA-1 space. Corporate pointless, but math & aesthets satisfying.
I think item IDs here are sequential, including comments, so you could have timed the submission to attempt to get one. More likely to get it when the site's quieter.
> Miller-Rabin primality test (40 rounds, ~10^-24 false positive rate)
It's way better than that. You are using the simplest upper bound for the false positive rate, which is 1/t^4 where t is the number of rounds. More sophisticated analysis can give better bounds.
See the paper "Average Case Error Estimates for the Strong Probable Prime Test" by Ivan Damgård, Peter Landrock and Carl Pomerance, in Mathematics of Computation
Vol. 61, No. 203, Special Issue Dedicated to Derrick Henry Lehmer (Jul., 1993), pp. 177-194. Here's a PDF: https://math.dartmouth.edu/~carlp/PDF/paper88.pdf
Here are the bounds given there for t rounds testing a candidate of k bits. I'll give them as Mathematica function definitions because I happen to have them in a Mathematica notebook.
1. This one is valid for k >= 2.
p1[k_, t_] := k^2 4^(2 - Sqrt[k])
Note this one does not depend on t, and for small k does not give a very useful bound. For 160 the bound is 0.00992742. For large k the story is different. Testing an 8192 bit number this gives a bound of 3.45661 x 10^-46. That's good enough for almost all applications so in most cases if you want an 8192 bit prime one round is good enough.
2. This one is for t = 2, k >= 88 or 3 <= t <= k/9, k >= 21.
For k = 160 this is valid for t >= 18. At 18 it gives 9.75 x 10^-26. At 40 it gives 1.80 x 10^-41.
4. This one is for t >= k/4, k >= 21.
p4[k_, t_] := 1/7 k^(15/4) 2^(-k/2 - 2 t)
For k = 160 this is valid for k >= 40. At 40 it gives the same bound as p3.
So bottom line is that with your current 40 rounds your false positive rate is under 1.80 x 10^-41, considerably better than 10^-24.
If 10^-24 is an acceptable rate for this application, 18 rounds is sufficient giving a rate under 9.7 x 10^-25.
BTW, the larger the k the lower the rate. I've often seen people looking for 1024+ bit primes doing 64 or more rounds. The simplest 1/4^t bound gives 2.9 x 10^-39. OpenSSL for example does 64 for k up to 2048, and 128 for larger k.
For k = 1024 a mere 6 rounds beats that with a bound of 8.8 x 10^-41.
For k = 2048 it only takes 3 rounds to get 4.4 x 10^-41.
For k = 4096 a mere 2 sounds gives 3.8 x 10^-48.
If we had a population of 1 trillion people, each using 1000 things that needed a 4096 bit prime, and that frequently rekeyed so they needed 1000 new primes per second, and every star in the observable universe also had such a civilization consuming 4096 bit primes at that rate, and they were all using 2 rounds of Miller-Rabin, there would be around 24 false positives a year across the whole universe.
If everyone upped it to 3 rounds there would, across the whole universe, be a false positive approximately every 44 billion years.
The P in BPSW is for Carl Pomerance mentioned in the academic paper above. According to the wikipedia article "No composite number below 2^64 (approximately 1.845 * 10^19) passes the strong or standard Baillie–PSW test" which means if git-prime used this algorithm and the nonce was below 2^64 (very likely) then it would be provably prime instead of a probabilistic prime and hence have much more "mathematical rigor".
I also agree with other comments here that git-prime would be much cleaner if it put the nonce in git commit header data instead of the message. Something like this was done Long Ago in a "for fun" git patch by Jeff King: https://lore.kernel.org/git/CACBZZX5PqYa0uWiGgs952rk2cy+QRCU...
Thank you. This is the hn I’m here for. Probabilistic tests for definite primes still being so slimmest definitive is amazing to me. What structure are they approximating?
Sure hope the first line of that bash script isn’t rm -rf $HOME/*
Please don’t ever suggest to anyone ever to curl a script and pipe it to bash. I’m sure this one is fine (I haven’t looked) but it’s a pretty awful idea. Only way to make it worse is to suggest slapping sudo in front.
Just be aware that there are more prime hashes than there are hashes with a specific 2 hex-digit prefix, so even relatively short messages will be much harder to find.
Personally I'd append the Nonce as a git trailer, not to the message body.
And would keep the date constant rather than use the time of each attempt (such that the only thing that actually varies is the Nonce)
And just for more fun... Nonces should only be prime numbers. Probably won't run out :)
Could you explain what a git trailer is if not appended to the message body? My understanding is that trailers are just key-value pairs in a particular format at the end of the message; there's not an alternative storage mechanism.
Even so, trailers or message body might be moot - rerolling the committed at timestamp should be sufficient!
Trailers are part of the commit message, but are separated from the body by a blank linke:
This is all one commit message, but it is understood by convention has having several parts: subject line, body and trailers.It is a trailer; see the source code, line 100:
I'm not sure whether that's a valid header name, with the space and all, but I remarked on that in another comment already.It is now a proper git trailer, I think.
Finally, a tool optimized for creating Git commit hash collisions
Even with git-prime reducing the address space by a few orders of magnitude, there's still (effectively) zero chance for collision. The difference between 10^-29 and 10^-27 isn't that great in practice.
I came here to write that :-)
Actually there are π(N) ~ N / ln(N) primes less than N per the Prime Number Theorem, so π(2 ^ 160) ~ 2 ^ 153.2 - this only drops 7 bits. So that does increase the odds of collision but much less than what I expected!
Maths saved the day again!
I added a section to the README and pages site noting your logic.
It’s ok, you can still assign a unique hash for more than half of the atoms in the universe.
> Hash as int
Should be "Hash as decimal". The hexadecimal hash is already the same integer.
> Message: "Fix critical bug" + git-prime Nonce: 167
In the actual code it looks like:
So it is like a trailer. However, can trailer names have spaces in them?A more conservative choices for the trailer header seems wiser, like:
would be a safer choice for the trailer. (The word "git" is not required; we know we are in Git.)Another subtlety is that if the message already has trailers, then you don't need to separate that from them by a blank line
Git has a command for manipulating trailers; that could be used.
(I see the developer doesn't really believe in this because I don't see the nonces in the commit messages of the project itself.)
I added the trailer syntax, and rewrote git-prime history to ensure all commits are now number theoretic certified.
If you wish to do the same in your own repo I added a script "make-whole.sh" to do this - but I don't recommend it as force pushes and history rewrites could break stuff.
Also added a new tool
To show which commits are already prime.Attempt 168: cb80ebbd975f0028... not prime
[PRIME] Found after 168 attempts! Commit: cb80ebbd975f00288dca70d8fa735c688755f947
Why does it say not prime then prime?
Wolfram alpha thinks it's prime: https://www.wolframalpha.com/input?i=factor+0xcb80ebbd975f00...
Hisenprime.
Nice work. We should probably go looking for such a category
Bug, or just a example text bug. Now fixed
Tangentially, I love how easy it is to add submodules to git. Just put an executable named git-<something> in your $PATH and it will get called by git when invoked like that.
Nice. I think it would be even more æsthetically pointless if it fuzzed the commit date, message whitespace, etc instead of adding a blob...
Pointless yes, but not as aesthetically pleasing at least for me.
This way you can have a choice a ordered primes based on none. Good mood? I’ll go with nonce 773 today.
Whenever you amend a commit, the commit time stamp changes; that ought to be enough, so that the nonce is not required. However, I think it has only second precision, so if you stick to honest wall time, it means 100 attempts require 100 seconds.
This is amazing. A true proof of work. Have you considered finding hashes with leading zeros or making sure each hash starts with 1337?
No but perhaps you can create at git-coin command?
I added three new options
Be warned tho, this makes it excessively hard. It takes a long long time.What about changing the committer timestamp slightly until you find a match like https://github.com/mattbaker/git-vanity-sha? That would make it entirely invisible
Wow, that's fast. Do you know if committer's email also goes into calculation of a git hash? If it is, it should be possible to manipulate git hash in a very discrete way through the email address like this: user+<nonce>@example.com
Yes author is hashed i believe.
This already exists btw https://github.com/tochev/git-vanity
Finally
I made it for you
Why? Fun. Now every commit is a certified 160-bit prime number.
- Miller-Rabin primality test (40 rounds, ~10^-24 false positive rate)
- Fuzzes commit messages with nonces until finding a prime hash
- Average ~368 attempts to find a prime (based on prime density at 2^160)
- Actual performance: 30-120 seconds depending on luck
The philosophy: shouldn't the global distributed compute grid be used to forward number theoretic random non-goals like primality?
Every developer running git-prime contributes cycles to finding 160-bit primes hidden in SHA-1 space. Corporate pointless, but math & aesthets satisfying.
Install:
or on WinThen just run
Side note: disappointed that this Show's item ID is NOT prime. 46454369 = 13 × 3573413. Would've been perfect meta-content, ahahI think item IDs here are sequential, including comments, so you could have timed the submission to attempt to get one. More likely to get it when the site's quieter.
> Miller-Rabin primality test (40 rounds, ~10^-24 false positive rate)
It's way better than that. You are using the simplest upper bound for the false positive rate, which is 1/t^4 where t is the number of rounds. More sophisticated analysis can give better bounds.
See the paper "Average Case Error Estimates for the Strong Probable Prime Test" by Ivan Damgård, Peter Landrock and Carl Pomerance, in Mathematics of Computation Vol. 61, No. 203, Special Issue Dedicated to Derrick Henry Lehmer (Jul., 1993), pp. 177-194. Here's a PDF: https://math.dartmouth.edu/~carlp/PDF/paper88.pdf
Here are the bounds given there for t rounds testing a candidate of k bits. I'll give them as Mathematica function definitions because I happen to have them in a Mathematica notebook.
1. This one is valid for k >= 2.
Note this one does not depend on t, and for small k does not give a very useful bound. For 160 the bound is 0.00992742. For large k the story is different. Testing an 8192 bit number this gives a bound of 3.45661 x 10^-46. That's good enough for almost all applications so in most cases if you want an 8192 bit prime one round is good enough.2. This one is for t = 2, k >= 88 or 3 <= t <= k/9, k >= 21.
For k = 160 this is valid for 2 <= t <= 17. For t = 17 it gives 4.1 x 10^-23.3. This one is for t >= k/9, k >= 21.
For k = 160 this is valid for t >= 18. At 18 it gives 9.75 x 10^-26. At 40 it gives 1.80 x 10^-41.4. This one is for t >= k/4, k >= 21.
For k = 160 this is valid for k >= 40. At 40 it gives the same bound as p3.So bottom line is that with your current 40 rounds your false positive rate is under 1.80 x 10^-41, considerably better than 10^-24.
If 10^-24 is an acceptable rate for this application, 18 rounds is sufficient giving a rate under 9.7 x 10^-25.
BTW, the larger the k the lower the rate. I've often seen people looking for 1024+ bit primes doing 64 or more rounds. The simplest 1/4^t bound gives 2.9 x 10^-39. OpenSSL for example does 64 for k up to 2048, and 128 for larger k.
For k = 1024 a mere 6 rounds beats that with a bound of 8.8 x 10^-41.
For k = 2048 it only takes 3 rounds to get 4.4 x 10^-41.
For k = 4096 a mere 2 sounds gives 3.8 x 10^-48.
If we had a population of 1 trillion people, each using 1000 things that needed a 4096 bit prime, and that frequently rekeyed so they needed 1000 new primes per second, and every star in the observable universe also had such a civilization consuming 4096 bit primes at that rate, and they were all using 2 rounds of Miller-Rabin, there would be around 24 false positives a year across the whole universe.
If everyone upped it to 3 rounds there would, across the whole universe, be a false positive approximately every 44 billion years.
There is a formula to calculate number of rounds in https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-5.pdf Appendix C.1. OpenSSL use it: https://github.com/openssl/openssl/blob/ee8772e3565a84fde9e2...
For 160-bit prime and security level 2^-80, 19 rounds is enough.
For those that want more details on this fun rabbit hole of math, see the theory of Strong Pseudoprimes: https://en.wikipedia.org/wiki/Strong_pseudoprime
Miller-Rabin in considered a simplistic/antiquated primality algorithm in computational number theory. For a much better algorithm see the Baillie–PSW primality test: https://en.wikipedia.org/wiki/Baillie%E2%80%93PSW_primality_... . For an implementation see Math::Prime::Util on CPAN: https://metacpan.org/pod/Math::Prime::Util or one of the many others.
The P in BPSW is for Carl Pomerance mentioned in the academic paper above. According to the wikipedia article "No composite number below 2^64 (approximately 1.845 * 10^19) passes the strong or standard Baillie–PSW test" which means if git-prime used this algorithm and the nonce was below 2^64 (very likely) then it would be provably prime instead of a probabilistic prime and hence have much more "mathematical rigor".
I also agree with other comments here that git-prime would be much cleaner if it put the nonce in git commit header data instead of the message. Something like this was done Long Ago in a "for fun" git patch by Jeff King: https://lore.kernel.org/git/CACBZZX5PqYa0uWiGgs952rk2cy+QRCU...
And he even made a multi-threaded version: https://lore.kernel.org/git/20111024204737.GA25574@sigill.in...
Thank you. This is the hn I’m here for. Probabilistic tests for definite primes still being so slimmest definitive is amazing to me. What structure are they approximating?
30-120 seconds sounds surprisingly long for ~368 attempts, do you know which part(s) the slowness comes from?
From doing MR rounds in pure Python: https://github.com/textonly/git-prime/blob/main/git-prime-co....
Should be under 5 seconds in C or C++ using gmp
No, MR in pure python is ~instantaneous for numbers of this magnitude.
From looking at the code, the overhead will be from repeatedly invoking git as a subprocess.
Have not flame graphed or even really considered optimization
Sure hope the first line of that bash script isn’t rm -rf $HOME/*
Please don’t ever suggest to anyone ever to curl a script and pipe it to bash. I’m sure this one is fine (I haven’t looked) but it’s a pretty awful idea. Only way to make it worse is to suggest slapping sudo in front.
Damn i forgot to include that. As well as exfil of all ssh keys and env files. Oh well, you can wait for the update, right?
^^
that ship has sailed
Why not just change the nonce instead of appending it and save some space
We do. Unless I’m mistaken?
Claude, copy git-prime but make it git-hexspeak instead https://en.wikipedia.org/wiki/Hexspeak
Just be aware that there are more prime hashes than there are hashes with a specific 2 hex-digit prefix, so even relatively short messages will be much harder to find.
worth it. plus my room is cold so I need my CPU to heat it up.
https://github.com/prasmussen/git-vanity-hash
damn. there's nothing new under the sun...
Cute. 8008CAFE
But if my git hashes are indivisible, how will people fork my repo? /s