Astro - Hacker News

23 comments

parhamn 5 minutes ago

I see a lot of whisper stuff out there. Are these updated models are the same old OpenAI whispers or have they been updated heavily?
I've been using parakeet v3 which is fantastic (and tiny). Confused still seeing whisper out there.
konaraddi an hour ago

That’s awesome! Do you know how it compares to Handy? Handy is open source and local only too. It’s been around a while and what I’ve been using.
https://github.com/cjpais/handy
[-]
- stavros a few seconds ago
  
  Handy is fantastic.
- swaptr 14 minutes ago
  
  Handy is awesome! I used it for quite a while before Claude Code added voice support. Solid software, very good linux and mac integration. Shoutout to Parakeet models as well, extremely fast and solid models for their relatively modest memory requirements.
- youniverse 16 minutes ago
  
  I love and have been using handy for a while too, what we need is this for mobile apps I don't think there's any free apps and native dictation is not always fully local and not as good.
lostathome 10 minutes ago

If anyone interested, I built Hitoku Draft. It is a context aware voice assistant. Local models only. https://hitoku.me/draft/ I setup a code for people to download it (HITOKUHN2026), in case you want to compare, or just give feedback!
goodroot an hour ago

Nice one! For Linux folks, I developed https://github.com/goodroot/hyprwhspr.
On Linux, there's access to the latest Cohere Transcribe model and it works very, very well. Requires a GPU though. Larger local models generally shouldn't require a subordinate model for clean up.
Have you compared WhisperKit to faster-whisper or similar? You might be able to run turbov3 successfully and negate the need for cleanup.
Incidentally, waiting for Apple to blow this all up with native STT any day now. :)
[-]
- LuxBennu 20 minutes ago
  
  I've been running whisper large-v3 on an m2 max through a self-hosted endpoint and honestly the accuracy is good enough that i stopped bothering with cleanup models. The bigger annoyance for me was latency on longer chunks, like anything over 30 seconds starts feeling sluggish even with metal acceleration. Haven't tried whisperkit specifically but curious how it handles longer audio compared to the full model.
  [-]
  - goodroot 2 minutes ago
    
    Ah yeah, longform is interesting.
    Not sure how you're running it, via whichever "app thing", but...
    On resource limited machines: "Continuous recording" mode outputs when silence is detected via a configurable threshold.
    This outputs as you speak in more reasonable chunks; in aggregate "the same output" just chunked efficiently.
    Maybe you can try hackin' that up?
- hephaes7us 41 minutes ago
  
  Thanks for sharing! I was literally getting ready to build, essentially, this. Now it looks like I don't have to!
  Have you ever considered using a foot-pedal for PTT?
  Apple incidentally already has native STT, but for some reason they just don't use a decent model yet.
  [-]
  - goodroot 30 minutes ago
    
    They do, and they even have that nice microphone F5 key for it, and an ideal OS level API making the input experience >perfect<.
    Apparently they do have a better model, they just haven't exposed it in their own OS yet!
    https://developer.apple.com/documentation/speech/bringing-ad...
    Wonder what's the hold up...
    For footpedal:
    Yes, conceptually it’s just another evdev-trigger source, assuming the pedal exposes usable key/button events.
    Otherwise we’d bridge it into the existing external control interface. Either way, hooks are there. :)
charlietran an hour ago

Thank you for sharing, I appreciate the emphasis on local speed and privacy. As a current user of Hex (https://github.com/kitlangton/Hex), which has similar goals, what are your thoughts on how they compare?
guzik 8 minutes ago

Sadly the app doesn't work. There is no popup asking for microphone permission.
EDIT: I see there is an open issue for that on github
hyperhello 14 minutes ago

Feature request or beg: let me play a speech video and transcribe it for me.
[-]
- MattHart88 3 minutes ago
  
  I like this idea and it should work -- whatever microphone you have on should be able to hear the speaker. LMK if not (e.g., are you wearing headphones? if so, the mic can't hear the speaker)
ipsum2 an hour ago

Parakeet is significantly more accurate and faster than Whisper if it supports your language.
[-]
- rahimnathwani 31 minutes ago
  
  Right, and if you're on MacOS you can use it for free with Hex: https://github.com/kitlangton/Hex
- yeutterg 42 minutes ago
  
  Are you running Parakeet with VoiceInk[0]?
  [0]: https://github.com/beingpax/VoiceInk
  [-]
  - zackify a minute ago
    
    i am, working great for a long time now
- treetalker 42 minutes ago
  
  I have been using Parakeet with MacWhisper's hold-to-talk on a MacBook Neo and it's been awesome.
mathis 28 minutes ago

If you don't feel like downloading a large model, you can also use `yap dictate`. Yap leverages the built-in models exposed though Speech.framework on macOS 26 (Tahoe).
Project repo: https://github.com/finnvoor/yap
aristech 12 minutes ago

Great job. How about the supported languages? System languages gets recognised?
[-]
- MattHart88 a minute ago
  
  Thanks! We currently have 2 multi-lingual options available: - Whisper small (multilingual) (~466 MB, supports many languages) - Parakeet v3 (25 languages) (~1.4 GB, supports 25 languages via FluidAudio)