I floated that idea a week ago: https://news.ycombinator.com/item?id=47096202, although I used the word "prompts" which users pointed out was obsolete. "Session" seems better for now.
The objections I heard, which seemed solid, are (1) there's no single input to the AI (i.e. no single session or prompt) from which such a project is generated,
(2) the back-and-forth between human and AI isn't exactly like working with a compiler (the loop of source code -> object code) - it's also like a conversation between two engineers [1]. In the former case, you can make the source code into an artifact and treat that as "the project", but you can't really do that in the latter case, and
(3) even if you could, the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't add much value.
At the same time, people have been submitting so many Show HNs of generated projects, often with nothing more than a generated repo with a generated readme. We need a better way of processing these because treating them like old-fashioned Show HNs is overwhelming the system with noise right now [2].
I don't want to exclude these projects, because (1) some of them are good, (2) there's nothing wrong with more people being able to create and share things, (3) it's foolish to fight the future, and (4) there's no obvious way to exclude them anyhow.
But the status quo isn't great because these projects, at the moment, are mostly not that interesting. What's needed is some kind of support to make them more interesting.
YoumuChan makes a similar point at https://news.ycombinator.com/item?id=47213296, comparing it to Google search history. The analogy is different but the issue (signal/noise ratio) is the same.
Unfortunately Codex doesn’t seem to be able to export the entire session as markdown, otherwise I’d suggest encouraging people to include that in their Show HNs. It’s kind of nuts that it’s so difficult to export what’s now a part of the engineering process.
I don’t have anything against vibe coded apps, but what makes them interesting is to see the vibe coding session and all the false starts along the way. You learn with them as they explore the problem space.
I don't think it's hard to export, on the contrary its all already saved it your ~/.claude which so you could write up a tool to convert the data there to markdown.
Why does the regular voting system fail here? Are there just too many Show HNs for people to process the new ones, so the good ones get lost in the noise?
> the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't really add that much value.
This is the major blocker for me. However, there might be value in saving a summary - basically the same as what you would get from taking meeting notes and then summarizing the important points.
That is by no means all of these projects. I'm not interested in a circle-the-wagons crackdown because it won't work (see "it's foolish to fight the future" above), and because we should be welcoming and educating new users in how to contribute substantively to HN.
Why should it be? The agent session is a messy intermediate output, not an artifact that should be part of the final product. If the "why" of a code change is important, have your agent write a commit message or a documentation file that is polished and intended for consumption.
In my case I have set up the agent is the repo. The repo texts compose the agent’s memory. Changes to the repo require the agent to approve.
Repos also message each other and coordinate plans and changes with each other and make feature requests which the repo agent then manages.
So I keep the agents’ semantically compressed memories as part of the repo as well as the original transcripts because often they lose coherence and reviewing every user submitted prompt realigns the specs and stories and requirements.
I think the parent comment is saying “why did the agent produce this big, and why wants it caught”, which is a separate problem from what granular commits solve, of finding the bug in the first place.
but that takes more tokens and time. if you just save the raw log, you can always do that later if you want to consume it. plus, having the full log allows asking many different questions later.
If you read the history of both and assuming that there’s good comments and documentation, it shows you the reasoning that went into the decision-making
I hope people start doing that. Not that it has any practical usage for the repo itself, but if everyone does that, it'd probably make it much easier for open weight models to catch up the proprietary ones. It'd be like a huge crowdsourced project to collect proprietary models' output for future training.
I don't think it should be. I think a distilled summary of what the agent did should be committed. This requires some dev discipline. But for example:
Make a button that does X when clicked.
Agent makes the button.
I tell it to make the button red.
Agent makes it red.
I test it, it is missing an edge case. I tell it to fix it.
It fixes it.
I don't like where the button is. I tell it to put it in the sidebar.
It does that.
I can go on and on. But we don't need to know all those intermediaries. We just need to know Red button that does X by Y mechanism is in the sidebar. Tests that include edge cases here. All tests passing. 2026-03-01
And that document is persisted.
If later, the button gets deleted or moved again or something, we can instruct the agent to say why. Button deleted because not used and was noisy. 2026-03-02
This can be made trivial via skills, but I find it a good way to understand a bit more deeply than commit messages would allow me to do.
Of course, we can also just write (or instruct agents to write) better PRs but AFAICT there's no easy way to know that the button came about or was deleted by which PR unless you spelunk in git blame.
Obviously yes, at least if not the prompts in the session, some simple / automated distillation of those prompts. Code generated by AI is already clearly not going to be reviewed as carefully as code produced by humans, and intentions / assumptions will only be documented in AI-generated comments to some limited degree, completely contingent on the prompt(s).
Otherwise, when fixing a bug, you just risk starting from scratch and wasting time using the same prompts and/or assumptions that led to the issue in the first place.
Much of the reason code review was/is worth the time is because it can teach people to improve, and prevent future mistakes. Code review is not really about "correctness", beyond basic issues, because subtle logic errors are in general very hard to spot; that is covered by testing.
With AI, at least as it is currently implemented, there is no learning, as such, so this removes much of the value of code review. But, if the goal is to prevent future mistakes, having some info about the prompts that led to the code at least brings some value back to the review process.
EDIT: Also, from a business standpoint, you still need to select for competent/incompetent prompters/AI users. It is hard to do so when you have no evidence of what the session looked like.
I was looking for an analogy and this is a good one.
The noise to signal ratio seems so bad. You’d have to sift through every little “thought”. If I could record my thought stream would I add it to the commit? Hell no.
Now, a summary of the reasoning, assumptions made and what alternatives were considered? Sure, that makes for a great message.
This seems wrong, like committing debug logs to the repo. There's also lots of research showing that models regularly produce incorrect trace tokens even with a correct solution, so there's questionable value even from a debugging perspective.
I've thought about this, and I do save the sessions for educational purposes. But what I ended up doing is exactly what I ask developers to do: update the bug report with the analysis, plan, notes etc. In the case there's a single PR fixing one bug, GitHub and Claude tend to prefer this information go in the PR description. That's ok for me since it's one click from the bug.
I’ve been thinking about a simple problem:
We’re increasingly merging AI-assisted code into production, but we rarely preserve the thing that actually produced it — the session.
Six months later, when debugging or reviewing history, the only artifact left is the diff.
So I built git-memento.
It attaches AI session transcripts to commits using Git notes.
You also have code comments, docs in the repo, the commit message, the description and comments on the PR, the description and comments on your Issue tracker.
Providing context for a change is a solved problem, and there is relatively mature MCP for all common tooling.
People won’t do that, unfortunately. We are a dying breed (I hate it). I went against my own instincts and vibe code this, works as a proof of concept.
You can see the session (including my typos) and compare what was asked for and what you got.
I already invented this in my head, thanks for not making me code it.
Excellent idea, I just wish GitHub would show notes. You also risk losing those notes if you rebase the commit they are attached to, so make sure you only attach the notes to a commit on main.
There is so much undefined in how agentic coding is going to mature. Something like what you're doing will need to be a part of it. Hopefully this makes some impressions and pushes things forward.
I did this in the beginning and realized I never went back to it. I think we have to learn to embrace the chaos. We can try to place a couple of anchors in the search space by having Claude summarize the code base every once in a while, but I am not sure if even that is necessary. The code it writes is git versioned and is probably enough to go on.
I think so. If nothing else, when you deploy and see a bug, you can have a script that revives the LLMs of the last N commits and ask "would your change have caused this?" Probably wouldn't work or be any more efficient than a new debugging agent most of the time, but it might sometimes and you'd have a fix PR ready before you even answered the pager, and a postmortem that includes WHY it did so, and a prompt to prevent that behavior in the future. And it's cheap, so why not.
Maybe not a permanent part of the commit, but something stored on the side for a few weeks at a time. Or even permanently, it could be useful to go back and ask, "why did you do it that way?", and realize that the reason is no longer relevant and you can simplify the design without worrying you're breaking something.
If you can, run several agents. They document their process. Trade offs considered, reasoning. Etc. it’s not a full log of the session but a reasonable history of how the code came to be. Commit it with the code. Namespace it however you want.
The barrier for entry is just including the complete sessions. It gets a little nuanced because of the sheer size and workflows around squash merging and what not, and deciding where you actually want to store the sessions. For instance, get notes is intuitive; however, there are complexities around it. Less elegant approach is just to take all sessions in separate branches.
Beyond this, you could have agents summarize an intuitive data structure as to why certain commits exist and how the code arrived there. I think this would be a general utility for human and AI code reviewers alike. That is what we built. Cost /utility need to make sense. Research needs to determine if this is all actually better than proper comments in code
If the model in use is managed by a 3rd party, can be updated at will, and also gives different output each time it is interacted with, what is the main benefit?
If I chat with an agent and give an initial prompt, and it gets "aspect A" (some arbitrary aspect of the expected code) wrong, I'll iterate to get "aspect A" corrected. Other aspects of the output may have exactly matched my (potentially unstated) expectation.
If I feed the initial prompt into the agent at some later date, should I expect exactly "aspect A" to be incorrect again? It seems more likely the result will be different, maybe with some other aspects being "unexpected". Maybe these new problems weren't even discussed in the initial archived chat log, since at that time they happened to be generated in a way in alignment with the original engineers expectation.
reproducibility isn't really the goal imo. more like a decision audit trail -- same reason code comments have value even though you can't regenerate the code from them. six months later when you're debugging you want to know 'why did we choose this approach' not 'replay the exact conversation.'
Because intent matters and 6 months or 3 years down the line and it's time to refactor, and the original human author is long gone, there's a difference if the prompt was "I need a login screen" vs "I need a login screen, it should support magic link login and nothing else".
Back in the dark ages, you'd "cc -s hello.c" to check the assembler source. With time we stopped doing that and hello.c became the originating artefact. On the same basis the session becomes the originating artefact.
I'm not sure this analogy holds, for two reasons. First, even in the best case, chain-of-thought transcripts don't reliably tell you what the agent is doing and why it's doing it. Second, if you're dealing with a malicious actor, the transcript may have no relation to the code they're submitting.
The reason you don't have to look at assembly is that the .c file is essentially a 100% reliable and unambiguous spec of how the assembly will look like, and you will be generating the assembly from that .c file as a part of the build process anyway. I don't see how this works here. It adds a lengthy artifact without lessening the need for a code review. It may be useful for investigations in enterprise settings, but in the OSS ecosystem?...
Also, people using AI coding tools to submit patches to open-source projects are weirdly hesitant to disclose that.
This is only true if a llm session would produce a deterministic output which is not the case. This whole “LLMs are the new compiler” argument doesn’t hold water.
"Deterministic" is not the issue either, it's that small changes of the input will cause unknown changes in the output. You might theoretically achieve determinism and reproducibility for the exact same input (seeding the random number generators etc.), but the issue is that even if you formulate your request just a little differently, by changing punctuation for example, you'll get an entirely different output.
With compilers, the rules are clear, e.g. if you replace variable names with different ones, the program will still do the same thing. If you add spaces in places where whitespace doesn't matter, like around operators, the resulting behavior will still be the same. You change one function's definition, it doesn't impact another function's definition. (I'm sure you can nitpick this with some edge case, but that's not the point, it overwhelmingly can be relied upon in this way in day to day work.)
LLMs are non-deterministic, you would end up with a different output even if you paste the same conversation in. Even if the model was identical at the time you tried to reproduce it. Which gets less likely as time passes.
Also, why would you need to reproduce it? You have the code. Almost any modification to said code would benefit from a fresh context and refined prompt.
An actual full context of a thinking agent is asinine, full of busy work, at best if you want to preserve the "reason" for the commits contents maybe you could summarise the context.
Other than that I see no reason to store the whole context per commit.
In our (small) team, we’ve taken to documenting/disclosing what part(s) of the process an LLM tool played in the proposed changes. We’ve all agreed that we like this better, both as submitters and reviewers. And though we’ve discussed why, none of us has coined exactly WHY we like this model better.
I’ve had the same thought, but after playing around with it, it just seems like adding noise. I never find myself looking at generated code and wondering “what prompt lead to that?” There’s no point, I won’t get any kind of useful response - I’m better off talking to the developer who committed it, that’s how code review works.
One of the use cases i see for this tool is helping companies to understand the output coming from the llm blackbox and the process which the employee took to complete a certain task
Except it doesn't capture the majority of uses of AI, in my experience. In my current practice, the the vast majority of AI use is autocompletions, or small inline prompts. ("Fix this error."; "Open an ALSA midi connection" (things that avoid a to trip into awful documentation); "if (one of the query parameters is "gear='ir') ..." (things that break flow by forcing a trip into excellent but overly verbose Javascript URL API documentation)). Only very occasionally will I prompt for a big chunk of code.
Maybe Git isn't the right tool to track the sessions. Some kind of new Semi-Human Intelligence Tracking tool. It will need a clever and shorter name though.
I floated that idea a week ago: https://news.ycombinator.com/item?id=47096202, although I used the word "prompts" which users pointed out was obsolete. "Session" seems better for now.
The objections I heard, which seemed solid, are (1) there's no single input to the AI (i.e. no single session or prompt) from which such a project is generated,
(2) the back-and-forth between human and AI isn't exactly like working with a compiler (the loop of source code -> object code) - it's also like a conversation between two engineers [1]. In the former case, you can make the source code into an artifact and treat that as "the project", but you can't really do that in the latter case, and
(3) even if you could, the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't add much value.
At the same time, people have been submitting so many Show HNs of generated projects, often with nothing more than a generated repo with a generated readme. We need a better way of processing these because treating them like old-fashioned Show HNs is overwhelming the system with noise right now [2].
I don't want to exclude these projects, because (1) some of them are good, (2) there's nothing wrong with more people being able to create and share things, (3) it's foolish to fight the future, and (4) there's no obvious way to exclude them anyhow.
But the status quo isn't great because these projects, at the moment, are mostly not that interesting. What's needed is some kind of support to make them more interesting.
So, community: what should we do?
[1] this point came from seldrige at https://news.ycombinator.com/item?id=47096903 and https://news.ycombinator.com/item?id=47108653.
YoumuChan makes a similar point at https://news.ycombinator.com/item?id=47213296, comparing it to Google search history. The analogy is different but the issue (signal/noise ratio) is the same.
[2] Is Show HN dead? No, but it's drowning - https://news.ycombinator.com/item?id=47045804 - Feb 2026 (422 comments)
Unfortunately Codex doesn’t seem to be able to export the entire session as markdown, otherwise I’d suggest encouraging people to include that in their Show HNs. It’s kind of nuts that it’s so difficult to export what’s now a part of the engineering process.
I don’t have anything against vibe coded apps, but what makes them interesting is to see the vibe coding session and all the false starts along the way. You learn with them as they explore the problem space.
mthurman pointed me to https://static.simonwillison.net/static/2025/claude-code-mic... - is that what you have in mind?
Yeah! That’s great. Having those alongside vibe coded apps would make them way more interesting.
I don't think it's hard to export, on the contrary its all already saved it your ~/.claude which so you could write up a tool to convert the data there to markdown.
Why does the regular voting system fail here? Are there just too many Show HNs for people to process the new ones, so the good ones get lost in the noise?
> the resulting artifact would be so noisy and complicated that saving it as part of the project wouldn't really add that much value.
This is the major blocker for me. However, there might be value in saving a summary - basically the same as what you would get from taking meeting notes and then summarizing the important points.
Plenty of commits link to mailing list discussions about the proposed change, maybe something like that, with an archive of LLM sessions?
Also the models change all the time and are not deterministic
> (2) there's nothing wrong with more people being able to create and share things
There is very clearly many things wrong with this when the things being shown require very little skill or effort.
That is by no means all of these projects. I'm not interested in a circle-the-wagons crackdown because it won't work (see "it's foolish to fight the future" above), and because we should be welcoming and educating new users in how to contribute substantively to HN.
Why should it be? The agent session is a messy intermediate output, not an artifact that should be part of the final product. If the "why" of a code change is important, have your agent write a commit message or a documentation file that is polished and intended for consumption.
In my case I have set up the agent is the repo. The repo texts compose the agent’s memory. Changes to the repo require the agent to approve.
Repos also message each other and coordinate plans and changes with each other and make feature requests which the repo agent then manages.
So I keep the agents’ semantically compressed memories as part of the repo as well as the original transcripts because often they lose coherence and reviewing every user submitted prompt realigns the specs and stories and requirements.
post mortems / bug hunting -- pinpointing what part of the logic was to blame for a certain problem.
this is what granular commits are for, the kilobytes long log of claude running in circles over bullshit isn't going to help anyone
I think the parent comment is saying “why did the agent produce this big, and why wants it caught”, which is a separate problem from what granular commits solve, of finding the bug in the first place.
but that takes more tokens and time. if you just save the raw log, you can always do that later if you want to consume it. plus, having the full log allows asking many different questions later.
How’s it any different than a diff log?
Better question: how is it in any way similar?
If you read the history of both and assuming that there’s good comments and documentation, it shows you the reasoning that went into the decision-making
No? For the same reason I don't want to work 8 hours a day with the boss looking over my shoulder.
I hope people start doing that. Not that it has any practical usage for the repo itself, but if everyone does that, it'd probably make it much easier for open weight models to catch up the proprietary ones. It'd be like a huge crowdsourced project to collect proprietary models' output for future training.
I don't think it should be. I think a distilled summary of what the agent did should be committed. This requires some dev discipline. But for example:
Make a button that does X when clicked.
Agent makes the button.
I tell it to make the button red.
Agent makes it red.
I test it, it is missing an edge case. I tell it to fix it.
It fixes it.
I don't like where the button is. I tell it to put it in the sidebar.
It does that.
I can go on and on. But we don't need to know all those intermediaries. We just need to know Red button that does X by Y mechanism is in the sidebar. Tests that include edge cases here. All tests passing. 2026-03-01
And that document is persisted.
If later, the button gets deleted or moved again or something, we can instruct the agent to say why. Button deleted because not used and was noisy. 2026-03-02
This can be made trivial via skills, but I find it a good way to understand a bit more deeply than commit messages would allow me to do.
Of course, we can also just write (or instruct agents to write) better PRs but AFAICT there's no easy way to know that the button came about or was deleted by which PR unless you spelunk in git blame.
Obviously yes, at least if not the prompts in the session, some simple / automated distillation of those prompts. Code generated by AI is already clearly not going to be reviewed as carefully as code produced by humans, and intentions / assumptions will only be documented in AI-generated comments to some limited degree, completely contingent on the prompt(s).
Otherwise, when fixing a bug, you just risk starting from scratch and wasting time using the same prompts and/or assumptions that led to the issue in the first place.
Much of the reason code review was/is worth the time is because it can teach people to improve, and prevent future mistakes. Code review is not really about "correctness", beyond basic issues, because subtle logic errors are in general very hard to spot; that is covered by testing.
With AI, at least as it is currently implemented, there is no learning, as such, so this removes much of the value of code review. But, if the goal is to prevent future mistakes, having some info about the prompts that led to the code at least brings some value back to the review process.
EDIT: Also, from a business standpoint, you still need to select for competent/incompetent prompters/AI users. It is hard to do so when you have no evidence of what the session looked like.
Should my google search history be part of the commit? To that question my answer is no.
I was looking for an analogy and this is a good one.
The noise to signal ratio seems so bad. You’d have to sift through every little “thought”. If I could record my thought stream would I add it to the commit? Hell no.
Now, a summary of the reasoning, assumptions made and what alternatives were considered? Sure, that makes for a great message.
Heck no. I don't even read the vast majority of the cack that my AI spits out for my own prompts. Why would I inflict that on anyone else?
This seems wrong, like committing debug logs to the repo. There's also lots of research showing that models regularly produce incorrect trace tokens even with a correct solution, so there's questionable value even from a debugging perspective.
If a car is used to get you somewhere, should you put the exhaust in bags to bring with you?
We use flight data recorders on airplanes, though.
Is session context car exhaust? Or is it the Event logs and code of the CPU/car's brains?
I've thought about this, and I do save the sessions for educational purposes. But what I ended up doing is exactly what I ask developers to do: update the bug report with the analysis, plan, notes etc. In the case there's a single PR fixing one bug, GitHub and Claude tend to prefer this information go in the PR description. That's ok for me since it's one click from the bug.
I’ve been thinking about a simple problem: We’re increasingly merging AI-assisted code into production, but we rarely preserve the thing that actually produced it — the session. Six months later, when debugging or reviewing history, the only artifact left is the diff. So I built git-memento. It attaches AI session transcripts to commits using Git notes.
> the only artifact left is the diff
You also have code comments, docs in the repo, the commit message, the description and comments on the PR, the description and comments on your Issue tracker.
Providing context for a change is a solved problem, and there is relatively mature MCP for all common tooling.
Not to mention AIs predilection for copious and overly abundant comments.
A better solution would be to read and understand the code before committing it.
People won’t do that, unfortunately. We are a dying breed (I hate it). I went against my own instincts and vibe code this, works as a proof of concept.
You can see the session (including my typos) and compare what was asked for and what you got.
Your starting point is that people won’t read code, and you expect them to read someone’s llm session from git?
Another LLM will read it of course.
Sounds like we've got an Ape Coder here!
https://rsaksida.com/blog/ape-coding/
Related ongoing thread:
Ape Coding [fiction] - https://news.ycombinator.com/item?id=47206798 - March 2026 (93 comments)
Personally, I'm not going to be complicit in reshaping the field around the lazy and undisciplined.
I already invented this in my head, thanks for not making me code it.
Excellent idea, I just wish GitHub would show notes. You also risk losing those notes if you rebase the commit they are attached to, so make sure you only attach the notes to a commit on main.
I added an action that will add a comment with the notes in GitHub so that you can see them directly.
I did work around squash to collect all sessions and concatenate them as a single one
Well done.
There is so much undefined in how agentic coding is going to mature. Something like what you're doing will need to be a part of it. Hopefully this makes some impressions and pushes things forward.
I did this in the beginning and realized I never went back to it. I think we have to learn to embrace the chaos. We can try to place a couple of anchors in the search space by having Claude summarize the code base every once in a while, but I am not sure if even that is necessary. The code it writes is git versioned and is probably enough to go on.
Just get it to write more comments about reasoning as you go.
I think so. If nothing else, when you deploy and see a bug, you can have a script that revives the LLMs of the last N commits and ask "would your change have caused this?" Probably wouldn't work or be any more efficient than a new debugging agent most of the time, but it might sometimes and you'd have a fix PR ready before you even answered the pager, and a postmortem that includes WHY it did so, and a prompt to prevent that behavior in the future. And it's cheap, so why not.
Maybe not a permanent part of the commit, but something stored on the side for a few weeks at a time. Or even permanently, it could be useful to go back and ask, "why did you do it that way?", and realize that the reason is no longer relevant and you can simplify the design without worrying you're breaking something.
If you can, run several agents. They document their process. Trade offs considered, reasoning. Etc. it’s not a full log of the session but a reasonable history of how the code came to be. Commit it with the code. Namespace it however you want.
We think so as well with emphasis on "why" for commits (i.e. intent provenance of all decisions).
https://github.com/eqtylab/y just a prototype, built at codex hackathon
The barrier for entry is just including the complete sessions. It gets a little nuanced because of the sheer size and workflows around squash merging and what not, and deciding where you actually want to store the sessions. For instance, get notes is intuitive; however, there are complexities around it. Less elegant approach is just to take all sessions in separate branches.
Beyond this, you could have agents summarize an intuitive data structure as to why certain commits exist and how the code arrived there. I think this would be a general utility for human and AI code reviewers alike. That is what we built. Cost /utility need to make sense. Research needs to determine if this is all actually better than proper comments in code
If the model in use is managed by a 3rd party, can be updated at will, and also gives different output each time it is interacted with, what is the main benefit?
If I chat with an agent and give an initial prompt, and it gets "aspect A" (some arbitrary aspect of the expected code) wrong, I'll iterate to get "aspect A" corrected. Other aspects of the output may have exactly matched my (potentially unstated) expectation.
If I feed the initial prompt into the agent at some later date, should I expect exactly "aspect A" to be incorrect again? It seems more likely the result will be different, maybe with some other aspects being "unexpected". Maybe these new problems weren't even discussed in the initial archived chat log, since at that time they happened to be generated in a way in alignment with the original engineers expectation.
reproducibility isn't really the goal imo. more like a decision audit trail -- same reason code comments have value even though you can't regenerate the code from them. six months later when you're debugging you want to know 'why did we choose this approach' not 'replay the exact conversation.'
Because intent matters and 6 months or 3 years down the line and it's time to refactor, and the original human author is long gone, there's a difference if the prompt was "I need a login screen" vs "I need a login screen, it should support magic link login and nothing else".
Isn’t that point of design docs and not the commit log?
Isn’t that what entire.io, founded by former GitHub CEO, is doing?
YES! The session becomes the source code.
Back in the dark ages, you'd "cc -s hello.c" to check the assembler source. With time we stopped doing that and hello.c became the originating artefact. On the same basis the session becomes the originating artefact.
I'm not sure this analogy holds, for two reasons. First, even in the best case, chain-of-thought transcripts don't reliably tell you what the agent is doing and why it's doing it. Second, if you're dealing with a malicious actor, the transcript may have no relation to the code they're submitting.
The reason you don't have to look at assembly is that the .c file is essentially a 100% reliable and unambiguous spec of how the assembly will look like, and you will be generating the assembly from that .c file as a part of the build process anyway. I don't see how this works here. It adds a lengthy artifact without lessening the need for a code review. It may be useful for investigations in enterprise settings, but in the OSS ecosystem?...
Also, people using AI coding tools to submit patches to open-source projects are weirdly hesitant to disclose that.
This is only true if a llm session would produce a deterministic output which is not the case. This whole “LLMs are the new compiler” argument doesn’t hold water.
"Deterministic" is not the issue either, it's that small changes of the input will cause unknown changes in the output. You might theoretically achieve determinism and reproducibility for the exact same input (seeding the random number generators etc.), but the issue is that even if you formulate your request just a little differently, by changing punctuation for example, you'll get an entirely different output.
With compilers, the rules are clear, e.g. if you replace variable names with different ones, the program will still do the same thing. If you add spaces in places where whitespace doesn't matter, like around operators, the resulting behavior will still be the same. You change one function's definition, it doesn't impact another function's definition. (I'm sure you can nitpick this with some edge case, but that's not the point, it overwhelmingly can be relied upon in this way in day to day work.)
cc was deterministic, you could be confident that the same code produced the same assembly each time you ran it
That is very much not the case with LLMs
LLMs are non-deterministic, you would end up with a different output even if you paste the same conversation in. Even if the model was identical at the time you tried to reproduce it. Which gets less likely as time passes.
Also, why would you need to reproduce it? You have the code. Almost any modification to said code would benefit from a fresh context and refined prompt.
An actual full context of a thinking agent is asinine, full of busy work, at best if you want to preserve the "reason" for the commits contents maybe you could summarise the context.
Other than that I see no reason to store the whole context per commit.
I haven't adopted this yet, but have a feeling that something like this is the right level of recording the llm contribution / session https://blog.bryanl.dev/posts/change-intent-records
I like it, but it seemed test could capture some of these "behaviors". But having it in a single document is helpful for context
In our (small) team, we’ve taken to documenting/disclosing what part(s) of the process an LLM tool played in the proposed changes. We’ve all agreed that we like this better, both as submitters and reviewers. And though we’ve discussed why, none of us has coined exactly WHY we like this model better.
I’ve had the same thought, but after playing around with it, it just seems like adding noise. I never find myself looking at generated code and wondering “what prompt lead to that?” There’s no point, I won’t get any kind of useful response - I’m better off talking to the developer who committed it, that’s how code review works.
The developer that committed won’t know because they didn’t write it…
You can avoid the noise with git notes. Add the session as a note on the commit. No one has to read them if they’re not interested.
One of the use cases i see for this tool is helping companies to understand the output coming from the llm blackbox and the process which the employee took to complete a certain task
Except it doesn't capture the majority of uses of AI, in my experience. In my current practice, the the vast majority of AI use is autocompletions, or small inline prompts. ("Fix this error."; "Open an ALSA midi connection" (things that avoid a to trip into awful documentation); "if (one of the query parameters is "gear='ir') ..." (things that break flow by forcing a trip into excellent but overly verbose Javascript URL API documentation)). Only very occasionally will I prompt for a big chunk of code.
Why do that? Just let them deal with it.
https://entire.io thinks so
Thank you! Was looking for this company. Founder was high up at GitHub. Really an interesting proposition
A summary of the session should be part of the commit message.
Proof sketch is not proof
nope. Someones going to leak important private data using something like this.
Consider:
"I got a bug report from this user:
... bunch of user PII ..."
The LLM will do the right thing with the code, the developer reviewed the code and didn't see any mention of the original user or bug report data.
Now the notes thing they forgot about goes and makes this all public.
No. Prompt-like document is enough. (e.g. skills, AGENTS.md)
I must say that would certainly show some funny converstaions in a log.
Maybe Git isn't the right tool to track the sessions. Some kind of new Semi-Human Intelligence Tracking tool. It will need a clever and shorter name though.
I don't think git is the right tool for much of modern software, where things like blobs aren't even properly supported.
I like to have a cup of coffee before my morning commit.
Germans are much more diligent about staging before they commit.
How about Entire?
https://techcrunch.com/2026/02/10/former-github-ceo-raises-r...
https://news.ycombinator.com/item?id=46961345