We did something similar — wrote markdown skill files that teach agents our coding patterns. Naming conventions, which libraries to use, how we structure components. Basically onboarding docs but for agents.
One thing we learned the hard way: shorter rules work better. We started with a 600-line comprehensive guide and the agent actually got worse. Every token in the skill competes for context window space with your actual conversation. Once we cut to under 200 lines per skill, consistency went up significantly.
The semantic vs pragmatic function split in this post is a good frame. I am not sure agents need that level of abstraction explained to them though — what they actually need is concrete examples. "Use pdfplumber not PyPDF2" beats "prefer minimal semantic functions" every time.
Wrestled with this a bit. The struggle with this one in particular is its as much for people to read as it is for agents, and the agents are secondary in its case.
I generally agree on this as best practice today, though I think it will become irrelevant in the next 2 generations of models.
it’s not that shorter rules are intrinsically better, it’s that longer rules tend to have irrelevant junk in them. ceteris paribus, longer rules are better. it’s just most of the time the longer rules fall under the Blaise Pascal-ian “i regret i didn’t have time to make this shorter”.
I haven't really extensively evaluated this, but my instinct is to really aggressively trim any 'instructions' files. I try to keep mine at a mid-double-digit linecount and leave out anything that's not critically important. You should also be skeptical of any instructions that basically boil down to "please follow this guideline that's generally accepted to be best practice" - most current models are probably already aware - stick to things that are unique to your project, or value decisions that aren't universally agreed upon.
For any sufficiently large codebase, the agent only ever has a very % of the code loaded into context. Context engineering strategies like "skills" allow the agent to more efficiently discover the key information required to produce consistent code.
mostly because reading the code base fills up the context window; as you aggregate context, you then need to synthesize the basics; these things arnt intelligence; they dont know whats useless and whats useful. They're as accurate as the structureyou surround them with.
I don't think testing the product alone is good enough, because when you give it tests it has to pass it prioritizes passing them at the expense of everything else — including code quality. I've seen it pull in random variables, break semantic functions, etc.
I've seen a lot of people talking about how AI is making codebases worse. I reject that, people are making codebases worse by not being intentional about how their AI writes code.
Agreed. When you submit code you must take responsibility for its quality. Blaming AI for low quality code is like blaming hammers for giant holes in the drywall. If you don't know how to use AI tools without confidence that your code is high quality, you need to re-assess how you use those tools. I'm not saying AI tools are bad. They're great. But the prevalence of people pushing the tools beyond their limits is not a failure of the tools. Vibe coding may be fun but tight-leash high-oversight AI usage is underrated in my opinion.
The intention part is right but the bottleneck is review. AI is really good at turning your clean semantic functions into pragmatic ones without you noticing. You ask for a feature, it slips a side effect into something that was pure, tests still pass. By the time you catch it you've got three more PRs built on top.
In my experience trying to push the onus of filtering out slop onto reviewers is both ineffective and unfair to the reviewer. When you submit code for review you are saying "I believe to the best of my ability that this code is high quality and adequate but it's best to have another person verify that." If the AI has done things without you noticing, you haven't reviewed its output well enough yet and shouldn't be submitting it to another person yet.
I think there’s just a lot of people who would love to push lower quality code for a variety of legitimate and illegitimate reasons (time pressure, cost, laziness, skill issues, bad management, etc). AI becomes a perfect scapegoat for lowered code quality.
And you’re completely right, humans are still the ones in control here. It’s entirely possible to use AI without lowering your standards.
We did something similar — wrote markdown skill files that teach agents our coding patterns. Naming conventions, which libraries to use, how we structure components. Basically onboarding docs but for agents.
One thing we learned the hard way: shorter rules work better. We started with a 600-line comprehensive guide and the agent actually got worse. Every token in the skill competes for context window space with your actual conversation. Once we cut to under 200 lines per skill, consistency went up significantly.
The semantic vs pragmatic function split in this post is a good frame. I am not sure agents need that level of abstraction explained to them though — what they actually need is concrete examples. "Use pdfplumber not PyPDF2" beats "prefer minimal semantic functions" every time.
Wrestled with this a bit. The struggle with this one in particular is its as much for people to read as it is for agents, and the agents are secondary in its case.
I generally agree on this as best practice today, though I think it will become irrelevant in the next 2 generations of models.
it’s not that shorter rules are intrinsically better, it’s that longer rules tend to have irrelevant junk in them. ceteris paribus, longer rules are better. it’s just most of the time the longer rules fall under the Blaise Pascal-ian “i regret i didn’t have time to make this shorter”.
I haven't really extensively evaluated this, but my instinct is to really aggressively trim any 'instructions' files. I try to keep mine at a mid-double-digit linecount and leave out anything that's not critically important. You should also be skeptical of any instructions that basically boil down to "please follow this guideline that's generally accepted to be best practice" - most current models are probably already aware - stick to things that are unique to your project, or value decisions that aren't universally agreed upon.
Shouldn't all of this be implicit from the codebase? Why do I have to write a file telling it these things?
For any sufficiently large codebase, the agent only ever has a very % of the code loaded into context. Context engineering strategies like "skills" allow the agent to more efficiently discover the key information required to produce consistent code.
mostly because reading the code base fills up the context window; as you aggregate context, you then need to synthesize the basics; these things arnt intelligence; they dont know whats useless and whats useful. They're as accurate as the structureyou surround them with.
AI comments are against the rules. Fuck off, bot.
Because of the way that I use AI, I am constantly looking at the code. I usually leave it alone, if I can; even if I don't really like it.
I will, often go back, after the fact, and ask for refactors and documentation.
It works. Probably a lot slower than using agents, but I test every step, and it is a lot faster than I would do it, unassisted.
I don't think testing the product alone is good enough, because when you give it tests it has to pass it prioritizes passing them at the expense of everything else — including code quality. I've seen it pull in random variables, break semantic functions, etc.
Oh, no. I test. Each. and. Every. Step.
I use a test harness, and step through the code, look at debug logs, and abuse the code, as much as possible.
Kind of a pain, but I find unit tests are a bit of a "false hope" kind of thing: https://littlegreenviper.com/testing-harness-vs-unit/
Page not rendering well on iPhone Safari.
Good content tho!
I've seen a lot of people talking about how AI is making codebases worse. I reject that, people are making codebases worse by not being intentional about how their AI writes code.
This is my take on how to not write slop.
Agreed. When you submit code you must take responsibility for its quality. Blaming AI for low quality code is like blaming hammers for giant holes in the drywall. If you don't know how to use AI tools without confidence that your code is high quality, you need to re-assess how you use those tools. I'm not saying AI tools are bad. They're great. But the prevalence of people pushing the tools beyond their limits is not a failure of the tools. Vibe coding may be fun but tight-leash high-oversight AI usage is underrated in my opinion.
The intention part is right but the bottleneck is review. AI is really good at turning your clean semantic functions into pragmatic ones without you noticing. You ask for a feature, it slips a side effect into something that was pure, tests still pass. By the time you catch it you've got three more PRs built on top.
In my experience trying to push the onus of filtering out slop onto reviewers is both ineffective and unfair to the reviewer. When you submit code for review you are saying "I believe to the best of my ability that this code is high quality and adequate but it's best to have another person verify that." If the AI has done things without you noticing, you haven't reviewed its output well enough yet and shouldn't be submitting it to another person yet.
I think there’s just a lot of people who would love to push lower quality code for a variety of legitimate and illegitimate reasons (time pressure, cost, laziness, skill issues, bad management, etc). AI becomes a perfect scapegoat for lowered code quality.
And you’re completely right, humans are still the ones in control here. It’s entirely possible to use AI without lowering your standards.
..but unintentional AI (aka Modern Chaos Monkey) is so much more fun!
LOL fr. I've been talking with some friends about RL on chaos monkeying the codebase to benchmark on feature isolation for measuring good code.