git bisect works great for tracking down regressions, but relies on the bug presenting deterministically. But what if the bug is non-deterministic? Or worse, your behaviour was always non-deterministic, but something has changed, e.g. your tests went from somewhat flaky to very flaky.
This is really cool! Is there an alternative way of thinking about it involving a hidden markov model, looking for a change in value of an unknown latent P(fail)? Or does your approach end up being similar to whatever the appropriate Bayesian approach to the HMM would be?
I hope this comment is not out of place, but I am wondering what the application for all this is? How can this help us or what does it teach us or help us prove? I am asking out of genuine curiosity as I barely understand it but I believe it has something to do with probability.
Bayesian inference is, to be overly simple, a way to write probabilistic if-statements and fit them from data. The "if" statement in this case is "if the bug is there...", and of course it's often the case that in actual software that if statement is probabilistic in nature. This thing does git bisect with a flaky bug with bayesian inference handling the flakiness to get you a decent estimate of where the bug is in the git history. It seems to be usable, or at least as usable as a Show HN thingy is expected to be.
Okay this is really fun and mathematically satisfying. Could even be useful for tough bugs that are technically deterministic, but you might not have precise reproduction steps.
Does it support running a test multiple times to get a probability for a single commit instead of just pass/fail? I guess you’d also need to take into account the number of trials to update the Beta properly.
IIUC the way you'd do that right now is just repeatedly recording the individual observations on a single commit, which effectively gives it a probability + the number of trials to do the Beta update. I don't yet have a CLI entrypoint to record a batch observation of (probability, num_trials), but it would be easy to add one
But ofc part of the magic is that git_bayesect's commit selection tells you how to be maximally sample efficient, so you'd only want to do a batch record if your test has high constant overhead
A related situation I was in recently was where I was trying to bisect a perf regression, but the benchmarks themselves were quite noisy, making it hard to tell whether I was looking at a "good" vs "bad" commit without repeated trials (in practice I just did repeats).
I could pick a threshold and use bayesect as described, but that involves throwing away information. How hard would it be to generalize this to let me plug in a raw benchmark score at each step?
git bisect works great for tracking down regressions, but relies on the bug presenting deterministically. But what if the bug is non-deterministic? Or worse, your behaviour was always non-deterministic, but something has changed, e.g. your tests went from somewhat flaky to very flaky.
In addition to the repo linked in the title, I also wrote up a little bit of the math behind it here: https://hauntsaninja.github.io/git_bayesect.html
This is really cool! Is there an alternative way of thinking about it involving a hidden markov model, looking for a change in value of an unknown latent P(fail)? Or does your approach end up being similar to whatever the appropriate Bayesian approach to the HMM would be?
I hope this comment is not out of place, but I am wondering what the application for all this is? How can this help us or what does it teach us or help us prove? I am asking out of genuine curiosity as I barely understand it but I believe it has something to do with probability.
Bayesian inference is, to be overly simple, a way to write probabilistic if-statements and fit them from data. The "if" statement in this case is "if the bug is there...", and of course it's often the case that in actual software that if statement is probabilistic in nature. This thing does git bisect with a flaky bug with bayesian inference handling the flakiness to get you a decent estimate of where the bug is in the git history. It seems to be usable, or at least as usable as a Show HN thingy is expected to be.
The writeup [1] linked on the README has examples and a better explanation
[1]: https://hauntsaninja.github.io/git_bayesect.html
Okay this is really fun and mathematically satisfying. Could even be useful for tough bugs that are technically deterministic, but you might not have precise reproduction steps.
Does it support running a test multiple times to get a probability for a single commit instead of just pass/fail? I guess you’d also need to take into account the number of trials to update the Beta properly.
Yay, I had fun with it too!
IIUC the way you'd do that right now is just repeatedly recording the individual observations on a single commit, which effectively gives it a probability + the number of trials to do the Beta update. I don't yet have a CLI entrypoint to record a batch observation of (probability, num_trials), but it would be easy to add one
But ofc part of the magic is that git_bayesect's commit selection tells you how to be maximally sample efficient, so you'd only want to do a batch record if your test has high constant overhead
recompiling can be high constant overhead
Super cool!
A related situation I was in recently was where I was trying to bisect a perf regression, but the benchmarks themselves were quite noisy, making it hard to tell whether I was looking at a "good" vs "bad" commit without repeated trials (in practice I just did repeats).
I could pick a threshold and use bayesect as described, but that involves throwing away information. How hard would it be to generalize this to let me plug in a raw benchmark score at each step?
Useful for tests with LLM interactions.