Handy is awesome! I used it for quite a while before Claude Code added voice support. Solid software, very good linux and mac integration. Shoutout to Parakeet models as well, extremely fast and solid models for their relatively modest memory requirements.
I love and have been using handy for a while too, what we need is this for mobile apps I don't think there's any free apps and native dictation is not always fully local and not as good.
If anyone interested, I built Hitoku Draft. It is a context aware voice assistant. Local models only. https://hitoku.me/draft/ I setup a code for people to download it (HITOKUHN2026), in case you want to compare, or just give feedback!
On Linux, there's access to the latest Cohere Transcribe model and it works very, very well. Requires a GPU though. Larger local models generally shouldn't require a subordinate model for clean up.
Have you compared WhisperKit to faster-whisper or similar? You might be able to run turbov3 successfully and negate the need for cleanup.
Incidentally, waiting for Apple to blow this all up with native STT any day now. :)
I've been running whisper large-v3 on an m2 max through a self-hosted endpoint and honestly the accuracy is good enough that i stopped bothering with cleanup models. The bigger annoyance for me was latency on longer chunks, like anything over 30 seconds starts feeling sluggish even with metal acceleration. Haven't tried whisperkit specifically but curious how it handles longer audio compared to the full model.
Thank you for sharing, I appreciate the emphasis on local speed and privacy. As a current user of Hex (https://github.com/kitlangton/Hex), which has similar goals, what are your thoughts on how they compare?
I like this idea and it should work -- whatever microphone you have on should be able to hear the speaker. LMK if not (e.g., are you wearing headphones? if so, the mic can't hear the speaker)
If you don't feel like downloading a large model, you can also use `yap dictate`. Yap leverages the built-in models exposed though Speech.framework on macOS 26 (Tahoe).
Thanks! We currently have 2 multi-lingual options available:
- Whisper small (multilingual) (~466 MB, supports many languages)
- Parakeet v3 (25 languages) (~1.4 GB, supports 25 languages via FluidAudio)
I see a lot of whisper stuff out there. Are these updated models are the same old OpenAI whispers or have they been updated heavily?
I've been using parakeet v3 which is fantastic (and tiny). Confused still seeing whisper out there.
That’s awesome! Do you know how it compares to Handy? Handy is open source and local only too. It’s been around a while and what I’ve been using.
https://github.com/cjpais/handy
Handy is fantastic.
Handy is awesome! I used it for quite a while before Claude Code added voice support. Solid software, very good linux and mac integration. Shoutout to Parakeet models as well, extremely fast and solid models for their relatively modest memory requirements.
I love and have been using handy for a while too, what we need is this for mobile apps I don't think there's any free apps and native dictation is not always fully local and not as good.
If anyone interested, I built Hitoku Draft. It is a context aware voice assistant. Local models only. https://hitoku.me/draft/ I setup a code for people to download it (HITOKUHN2026), in case you want to compare, or just give feedback!
Nice one! For Linux folks, I developed https://github.com/goodroot/hyprwhspr.
On Linux, there's access to the latest Cohere Transcribe model and it works very, very well. Requires a GPU though. Larger local models generally shouldn't require a subordinate model for clean up.
Have you compared WhisperKit to faster-whisper or similar? You might be able to run turbov3 successfully and negate the need for cleanup.
Incidentally, waiting for Apple to blow this all up with native STT any day now. :)
I've been running whisper large-v3 on an m2 max through a self-hosted endpoint and honestly the accuracy is good enough that i stopped bothering with cleanup models. The bigger annoyance for me was latency on longer chunks, like anything over 30 seconds starts feeling sluggish even with metal acceleration. Haven't tried whisperkit specifically but curious how it handles longer audio compared to the full model.
Ah yeah, longform is interesting.
Not sure how you're running it, via whichever "app thing", but...
On resource limited machines: "Continuous recording" mode outputs when silence is detected via a configurable threshold.
This outputs as you speak in more reasonable chunks; in aggregate "the same output" just chunked efficiently.
Maybe you can try hackin' that up?
Thanks for sharing! I was literally getting ready to build, essentially, this. Now it looks like I don't have to!
Have you ever considered using a foot-pedal for PTT?
Apple incidentally already has native STT, but for some reason they just don't use a decent model yet.
They do, and they even have that nice microphone F5 key for it, and an ideal OS level API making the input experience >perfect<.
Apparently they do have a better model, they just haven't exposed it in their own OS yet!
https://developer.apple.com/documentation/speech/bringing-ad...
Wonder what's the hold up...
For footpedal:
Yes, conceptually it’s just another evdev-trigger source, assuming the pedal exposes usable key/button events.
Otherwise we’d bridge it into the existing external control interface. Either way, hooks are there. :)
Thank you for sharing, I appreciate the emphasis on local speed and privacy. As a current user of Hex (https://github.com/kitlangton/Hex), which has similar goals, what are your thoughts on how they compare?
Sadly the app doesn't work. There is no popup asking for microphone permission.
EDIT: I see there is an open issue for that on github
Feature request or beg: let me play a speech video and transcribe it for me.
I like this idea and it should work -- whatever microphone you have on should be able to hear the speaker. LMK if not (e.g., are you wearing headphones? if so, the mic can't hear the speaker)
Parakeet is significantly more accurate and faster than Whisper if it supports your language.
Right, and if you're on MacOS you can use it for free with Hex: https://github.com/kitlangton/Hex
Are you running Parakeet with VoiceInk[0]?
[0]: https://github.com/beingpax/VoiceInk
i am, working great for a long time now
I have been using Parakeet with MacWhisper's hold-to-talk on a MacBook Neo and it's been awesome.
If you don't feel like downloading a large model, you can also use `yap dictate`. Yap leverages the built-in models exposed though Speech.framework on macOS 26 (Tahoe).
Project repo: https://github.com/finnvoor/yap
Great job. How about the supported languages? System languages gets recognised?
Thanks! We currently have 2 multi-lingual options available: - Whisper small (multilingual) (~466 MB, supports many languages) - Parakeet v3 (25 languages) (~1.4 GB, supports 25 languages via FluidAudio)