Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

(github.com)

24 points | by yu3zhou4  2 hours ago

2 comments