What’s the recommended RAM for running some of these? There is a “Memory” section but the numbers look low compared to what I was expecting - maybe this is right but they are heavily quantised.
Basically trying to work out what I get to play with on my 16Gb M1.
Hey HN! I built vLLM-MLX alike framework on macOS, which is painfully slow on Apple Silicon machines.
vLLM-MLX brings native GPU acceleration using Apple's MLX framework, with:
Quick start: pip install -e . vllm-mlx serve mlx-community/Llama-3.2-3B-Instruct-4bitWorks with standard OpenAI SDK. Happy to answer questions!
GitHub: https://github.com/waybarrios/vllm-mlx
What’s the recommended RAM for running some of these? There is a “Memory” section but the numbers look low compared to what I was expecting - maybe this is right but they are heavily quantised.
Basically trying to work out what I get to play with on my 16Gb M1.
same here