Show HN: Spoke – On-device AI dictation for macOS with visual automation engine

(usespoke.app)

2 points | by usespoke 2 days ago ago

3 comments

gamifykaran 2 days ago
Really like the direction here, especially the focus on fully on device and no account. The flow builder idea feels like the most interesting part to me as it goes beyond simple dictation into actual workflows. I have been using Voibe, also local and Whisper based, and it made me realize how big the UX difference is once everything runs on device. No lag and no privacy concerns . Your approach seems more focused on automation while that is more about quick dictation, which is an interesting tradeoff.
Curious what you are seeing from users so far. Are people actually building complex flows or mostly sticking to simple dictation?
usespoke 2 days ago
I built Spoke because I was tired of paying $15/month for a dictation app that sends my audio to a cloud server I don't control.
Spoke runs a 600M-parameter speech model (NVIDIA Parakeet TDT) entirely on-device — no internet required, audio never leaves your Mac. On Apple Silicon it transcribes 60 seconds of audio in ~400ms (150x realtime). Word error rate is 6.34% vs Whisper large-v3's 7.4%, at 2.6x smaller model size.
The part I'm most proud of is the Flow builder — a visual automation engine on top of the transcription layer. Instead of just "speak → insert text", you can chain 14 node types: AI Skills (with 5 provider options including Ollama for fully local LLMs), webhooks, AppleScript, Shortcuts, conditional routing by active app, text transforms, clipboard, file saves, and more. So you can do things like: speak casually → rewrite to professional tone → insert into the active app → send a webhook log → save to a daily journal file. All triggered from a single keypress.
A few things I deliberately did differently:
- Native SwiftUI, not Electron. Under 50MB RAM at idle vs 500-800MB for cloud alternatives - No account required - $9.99 one-time vs $180/year competitors (50 free uses to try it) - API keys stored in macOS Keychain, not their servers - Per-app flow configuration (different behavior in VS Code vs Slack vs Mail) - Voice ID — biometric speaker verification so it only responds to you
I'm a solo developer, shipped this about two weeks ago. It's had its first real users and I've been iterating fast based on feedback. Just shipped v1.1.0 yesterday.
Would love honest feedback — especially from people who've tried Superwhisper, Wispr Flow, or similar tools. What did I miss? What would make you switch?
https://usespoke.app
[-]
- FloatArtifact a day ago
  Are you using intents or grammars with the speech recognition engine?