A macOS menu-bar voice input. Hold the hotkey, speak — text streams into your cursor in real time, and every sentence sediments into a searchable, private memory layer that lives only on your Mac.
Local engines and Bring-Your-Own-Key paths are free forever, no subscription. Pro removes all cloud quotas with zero configuration.
Three on-device ASR engines (SenseVoice / Paraformer / Apple). Nothing leaves your Mac.
Plug in any OpenAI-compatible API key (DeepSeek / Kimi / OpenAI / your local OpenAI-compatible server). Unlimited tidy, you control the cost.
Zero-config cloud ASR + cloud AI tidy. Removes the 60 min/mo + 50 tidy/day quota. For people who just want it to work.
Input on top, voice archive in the middle, AI memory underneath.
Hold, speak, release. Mixed-language, homophones, fillers handled silently. Under 1.4s.
Every line archives locally with source app, time, tags. Search, filter, export.
7 personas review your week. A weekly MBTI sketch. 3–5 quotes worth echoing.
7 built-in personas plus your own. Contrast itself is the memory.
Introspection up, decisions slower. Three returns to the "sunk cost" theme.
"Speed is iron law — no feature may add perceived latency."
Casing, punctuation, and unit spacing — handled locally in <5ms, no LLM call.
LLMs handle semantic judgment (homophones, fillers). The local engine handles format (brand casing, punctuation spacing, unit spacing). Two layers; both wins.
45+ brand names auto-corrected. Spacing after English commas/periods. Half-width parentheses kept for ASCII. Every rule toggleable.
Pin source app at record-start. Switch windows — still lands right, with clipboard fallback.
Stop talking, wait 1–2 seconds, polished text lands in one shot. You never see the raw.
Local pinyin injected into prompt. Homophone pairs no longer confused.
Your edits on AI output extract as candidate rules. Accept from menu bar.
AI models, dev tools, Apple products built-in. "Cursor" stays "Cursor".
More text, more transparent. Breathing glow hints AI is working. Original never disappears.
Audio, text and history all live on your Mac. One single number leaves — seconds per recording.
Audio and text land in the app's own directory, auto-backed up on launch. Uninstall takes it all with you.
Each recording sends only its length to the global pulse. No identity, IP, content, or context. Toggle off in Settings.
API keys sit in macOS Keychain, never on our servers. ASR runs directly against Volcengine, nothing persisted.
We don't pile features — we only ship what we believe earns its place.
Fixed the occasional stray filler word appearing in text — filler-word removal is cleaner now. New cross-device stats: words / time / streak now add up across devices when signed in to the same account on multiple Macs (content stays local, never uploaded). Fixed the invite page occasionally showing a false "network error".
Smoother onboarding: after granting permissions, one big button restarts the app so they take effect immediately; the onboarding window pops up more reliably on first launch. Launch at login is now on by default, so VoiceInput is still there after you reboot. Local direct-insert is smoother — text appears more continuously as you speak.
Cloud recognition / AI cleanup is faster: connection reuse makes text appear sooner after release in most cases. Translation / bilingual is more accurate: proper nouns (company / product names) are recognized better.
New AI Translation: speak and get the translation directly, or original + translation side by side, 50+ languages — switch "Tidy / Translate / Bilingual" from the menu bar. AI cleanup is more accurate: fixed the occasional case where it answered your question instead of just tidying what you said.
Cloud speech recognition starts up faster — pressing right Option begins recognition almost instantly. Fixed occasional response stalls.
AI cleanup speed massively improved — release-to-text feels generation-leap fast again. New-version updates are now harder to miss: after 24h the banner turns red, after 48h it auto-restarts to finish the update. Onboarding redesigned with a full-size keyboard and three-phase animation (press → speak → release & text appears) so first-time users know which key at first glance. Dashboard adds a one-click fix entry when Accessibility permission becomes stale.
AI cleanup is noticeably faster — release the key and polished text appears almost instantly, no configuration needed. An all-new onboarding flow helps first-time users get up and running right away. Long-sentence direct insert is more reliable and produces more complete results. Recording status is clearer and more polished, proper-noun recognition is sharper, and sign-in on certain networks is fixed.
Yes. ASR is Volcengine, LLM can be Doubao / DeepSeek / Kimi / OpenAI. Full control over account and bill. Typical: CNY 5–20/month.
On Chinese scenarios, much faster end-to-end (1.4s vs 3–10s). And it's not just input — everything you say becomes a searchable memory.
macOS 14.0+, Apple Silicon + Intel. 22.6 MB DMG, non-App-Store, Sparkle auto-update.
Microphone, Input Monitoring, Accessibility. Granted once via the onboarding page.
No. Prompt constrains LLM to three jobs: fix homophones, drop fillers, add punctuation. Confidence < 0.5 keeps the original. Double-tap right Option to bypass AI.
Yes. Markdown / JSON / CSV export. Copy the DB file to the same path on a new Mac.
Yes. Clear API config, all memory features stop. Local typography engine keeps running.
Honest, side-by-side comparisons against the tools you're probably also evaluating.
Faster on Chinese, plus a memory layer Superwhisper doesn't have. Read the side-by-side →
Different categories: Wispr rewrites tone, VoiceInput keeps memory. Compare →
No 60-second cap, AI cleanup, mixed-language handling, full archive. See full →
Download · grant three permissions · hold right Option. Thirty seconds to get it.
Download VoiceInput_v0.75.0.dmg · 22.7 MB