Still halfway through reading, but what you've made can unlock a lot of use cases.
> I tried SQLite first, but its extension API is limited and write performance with custom storage was painfully slow
For many use cases, write performance does not matter much. Other than the initial import, in many cases we don't change text that fast. But the simpler logistics of having a sqlite database, with the dual (git+SQL) access to text is huge.
That said, for the specific use case I have in mind, postgres is perfectly fine
SQLite is fine right up until you want concurrent writers. Once you need multiple users, cross-host access, or anything that looks like shared infra instead of a local cache, the file-locking model stops being cute and starts setting the rules for the whole design. For collaborative versioning, Postgres makes more sense.
I did actually look into writing the extension for duckdb. But similar to SQLite the extension possibilities are not great for what I needed. Though duckdb is a great database.
I love it. I love having agents write SQL. It's very efficient use of context and it doesn't try to reinvent informal retrieval part of following the context.
Did you find you needed to give agents the schema produced by this or they just query it themselves from postgres?
so most analyses already have a CLI function you can just call with parameters. for those that don't, in my case, the agent just looked at the --help of the commands and was able to perform the queries.
Still halfway through reading, but what you've made can unlock a lot of use cases.
> I tried SQLite first, but its extension API is limited and write performance with custom storage was painfully slow
For many use cases, write performance does not matter much. Other than the initial import, in many cases we don't change text that fast. But the simpler logistics of having a sqlite database, with the dual (git+SQL) access to text is huge.
That said, for the specific use case I have in mind, postgres is perfectly fine
SQLite is fine right up until you want concurrent writers. Once you need multiple users, cross-host access, or anything that looks like shared infra instead of a local cache, the file-locking model stops being cute and starts setting the rules for the whole design. For collaborative versioning, Postgres makes more sense.
Also SQLite in WAL/WAL2 mode is definitely not amy slower for writing than Postgres either.
sounds great yes. maybe an SQLite version will come in the future
why do agents need to know these metas about git history to perform its coding functions though?
even humans don’t do this unless there’s a crazy bug causing them to search around every possible angles.
that said, this sound like a great and fun project to work on.
but the difference between you and an agent is that you naturally know the history of the project if you have worked on it. the AI doesnt.
Of course, we can’t leave out a mention of Fossil here — the SCM system built by and for SQLite.
https://fossil-scm.org/
How much does it take advantage of being a DB underneath?
And fossil itself is an SQLite database!
yeah fossil is great, but can fossil import the linux kernel (already working on the next post)
This is incredibly neat and might actually become a part of my toolbox.
thanks! but it might still need some releases until it's really good. just don't rely on it ;)
Wouldn't duckdb be better suited for this? Forgive the stupid question. I just connected "csv as sql" to "git as sql" and duckdb comes to mind
I did actually look into writing the extension for duckdb. But similar to SQLite the extension possibilities are not great for what I needed. Though duckdb is a great database.
I love it. I love having agents write SQL. It's very efficient use of context and it doesn't try to reinvent informal retrieval part of following the context.
Did you find you needed to give agents the schema produced by this or they just query it themselves from postgres?
so most analyses already have a CLI function you can just call with parameters. for those that don't, in my case, the agent just looked at the --help of the commands and was able to perform the queries.
Interesting... could be used to store multiple git repos and do a full text search across the multiple repos ?
in theory yes. you just need to do the full text search across the databases. pgit doesnt support it but at the end its just postgres under the hood.