Don't use a vector database for code, embeddings are slow and bad for code. Code likes bm25+trigram, that gets better results while keeping search responses snappy.
static embedding models im finding quite fast
lee101/gobed https://github.com/lee101/gobed is 1ms on gpu :) would need to be trained for code though the bigger code llm embeddings can be high quality too so its just yea about where is ideal on the pareto fronteir really , often yea though your right it tends to be bm25 or rg even for code but yea more complex solutions are kind of possible too if its really important the search is high quality
lee101/gobed https://github.com/lee101/gobed static embedding models so they are embedded in milliseconds and on gpu search with a cagra style on gpu index with a few things for speed like int8 quantization on the embeddings and fused embedding and search in the same kernel as the embedding really is just a trained map of embeddings per token/averaging
I built a lib for myself https://pypi.org/project/piragi/
Don't use a vector database for code, embeddings are slow and bad for code. Code likes bm25+trigram, that gets better results while keeping search responses snappy.
With AI needing more access to documentation, WDYT about using RAG for documentation retrieval?
static embedding models im finding quite fast lee101/gobed https://github.com/lee101/gobed is 1ms on gpu :) would need to be trained for code though the bigger code llm embeddings can be high quality too so its just yea about where is ideal on the pareto fronteir really , often yea though your right it tends to be bm25 or rg even for code but yea more complex solutions are kind of possible too if its really important the search is high quality
If your data aren't too large, you can use faiss-cpu and pickle
https://pypi.org/project/faiss-cpu/
For the uneducated, how large is too large? Curious.
FAISS runs in RAM. If your dataset can't fit into ram, FAISS is not the right tool.
lee101/gobed https://github.com/lee101/gobed static embedding models so they are embedded in milliseconds and on gpu search with a cagra style on gpu index with a few things for speed like int8 quantization on the embeddings and fused embedding and search in the same kernel as the embedding really is just a trained map of embeddings per token/averaging
Sqlite-vec
Local LibreChat which bundles a vector db for docs.
LightRAG, Archestra as a UI with LightRAG mcp
sqlite's bm25
A little BM25 can get you quite a way with an LLM.
try out chroma or better yet as opus to!
simple lil setup with qdrant
Anythingllm is promising
undergrowth.io
Undergrowth.io
SQLite with FTS5