Really like the direction here, especially the focus on fully on device and no account. The flow builder idea feels like the most interesting part to me as it goes beyond simple dictation into actual workflows. I have been using Voibe, also local and Whisper based, and it made me realize how big the UX difference is once everything runs on device. No lag and no privacy concerns . Your approach seems more focused on automation while that is more about quick dictation, which is an interesting tradeoff.
Curious what you are seeing from users so far. Are people actually building complex flows or mostly sticking to simple dictation?
I built Spoke because I was tired of paying $15/month for a dictation app
that sends my audio to a cloud server I don't control.
Spoke runs a 600M-parameter speech model (NVIDIA Parakeet TDT) entirely
on-device — no internet required, audio never leaves your Mac. On Apple
Silicon it transcribes 60 seconds of audio in ~400ms (150x realtime).
Word error rate is 6.34% vs Whisper large-v3's 7.4%, at 2.6x smaller
model size.
The part I'm most proud of is the Flow builder — a visual automation
engine on top of the transcription layer. Instead of just "speak → insert
text", you can chain 14 node types: AI Skills (with 5 provider options
including Ollama for fully local LLMs), webhooks, AppleScript, Shortcuts,
conditional routing by active app, text transforms, clipboard, file saves,
and more. So you can do things like: speak casually → rewrite to
professional tone → insert into the active app → send a webhook log →
save to a daily journal file. All triggered from a single keypress.
A few things I deliberately did differently:
- Native SwiftUI, not Electron. Under 50MB RAM at idle vs 500-800MB for
cloud alternatives
- No account required
- $9.99 one-time vs $180/year competitors (50 free uses to try it)
- API keys stored in macOS Keychain, not their servers
- Per-app flow configuration (different behavior in VS Code vs Slack vs Mail)
- Voice ID — biometric speaker verification so it only responds to you
I'm a solo developer, shipped this about two weeks ago. It's had its first
real users and I've been iterating fast based on feedback. Just shipped
v1.1.0 yesterday.
Would love honest feedback — especially from people who've tried Superwhisper,
Wispr Flow, or similar tools. What did I miss? What would make you switch?
Really like the direction here, especially the focus on fully on device and no account. The flow builder idea feels like the most interesting part to me as it goes beyond simple dictation into actual workflows. I have been using Voibe, also local and Whisper based, and it made me realize how big the UX difference is once everything runs on device. No lag and no privacy concerns . Your approach seems more focused on automation while that is more about quick dictation, which is an interesting tradeoff.
Curious what you are seeing from users so far. Are people actually building complex flows or mostly sticking to simple dictation?
I built Spoke because I was tired of paying $15/month for a dictation app that sends my audio to a cloud server I don't control.
Spoke runs a 600M-parameter speech model (NVIDIA Parakeet TDT) entirely on-device — no internet required, audio never leaves your Mac. On Apple Silicon it transcribes 60 seconds of audio in ~400ms (150x realtime). Word error rate is 6.34% vs Whisper large-v3's 7.4%, at 2.6x smaller model size.
The part I'm most proud of is the Flow builder — a visual automation engine on top of the transcription layer. Instead of just "speak → insert text", you can chain 14 node types: AI Skills (with 5 provider options including Ollama for fully local LLMs), webhooks, AppleScript, Shortcuts, conditional routing by active app, text transforms, clipboard, file saves, and more. So you can do things like: speak casually → rewrite to professional tone → insert into the active app → send a webhook log → save to a daily journal file. All triggered from a single keypress.
A few things I deliberately did differently:
- Native SwiftUI, not Electron. Under 50MB RAM at idle vs 500-800MB for cloud alternatives - No account required - $9.99 one-time vs $180/year competitors (50 free uses to try it) - API keys stored in macOS Keychain, not their servers - Per-app flow configuration (different behavior in VS Code vs Slack vs Mail) - Voice ID — biometric speaker verification so it only responds to you
I'm a solo developer, shipped this about two weeks ago. It's had its first real users and I've been iterating fast based on feedback. Just shipped v1.1.0 yesterday.
Would love honest feedback — especially from people who've tried Superwhisper, Wispr Flow, or similar tools. What did I miss? What would make you switch?
https://usespoke.app
Are you using intents or grammars with the speech recognition engine?