Dictation explained · 2026

What is AI dictation? (and how it differs from speech-to-text)

AI dictation is voice typing with a second step. One model turns your speech into text. A second model cleans it up - fixing punctuation, dropping filler, restoring code identifiers, shifting tone, even translating. Here is what that cleanup does, the topic profiles that control it, and why SnailText runs the whole thing on your machine.

By SnailText's founder · Published 2026-06-21

The short version

Plain speech-to-text turns audio into raw words. AI dictation adds a second model - a language model - that polishes the transcript: it removes "um" and "you know", fixes punctuation and grammar, restores code identifiers like snake_case, shifts tone, and can translate. SnailText runs that cleanup on-device with five topic profiles (general, development, writing, business, academic) and a choice of identifier styles, and you can turn it off for verbatim output. The detail most apps gloss over: their cleanup runs in the cloud (GPT, Claude, Gemini), so the transcript is uploaded even when the audio was local. SnailText runs both models on your machine, so nothing leaves it.

Speech-to-text vs AI dictation at a glance

Speech-to-text vs AI dictation at a glance (verified 2026-06-21)
Axis	Plain speech-to-text	AI dictation
What it produces	Raw transcript of what you said	Cleaned-up text, ready to send
Models involved	One (speech-to-text)	Two (speech-to-text, then a language model)
Filler words (um, uh)	Left in	Removed
Punctuation & grammar	Best-effort from the speech model	Corrected by the language model
Style / tone	Verbatim only	Can shift casual to formal, or to a code style
Translation	No	Yes - speak one language, get another
Where the cleanup runs	N/A (no cleanup step)	Cloud in most apps; on-device in SnailText

For most of the last decade, “dictation” meant one thing: a model listened to your voice and typed out what it heard, word for word. Filler and all. AI dictation adds a second step. After the transcript exists, a language model reads it and cleans it up - the way a careful editor would, but in a fraction of a second.

That second step is the whole difference. It is also where the privacy question hides, because in most apps the cleanup happens on someone else’s server.

Speech-to-text: the first model

Speech-to-text (also called speech recognition, or STT) is the foundational technology. You speak, a model converts the audio into a string of words. The two open models that power most desktop dictation in 2026 are OpenAI’s Whisper and NVIDIA’s Parakeet TDT. Both can run entirely on your own hardware.

What you get from this step is a faithful transcript. If you said “um, so I think we should, you know, ship it on Friday”, that is roughly what comes out. Accurate, but not something you would paste into an email without tidying it first.

That tidying used to be your job. Now a second model does it.

AI dictation: adding the language model

AI dictation runs the raw transcript through a language model (the same class of model behind ChatGPT, Claude, and Gemini). The language model does the editing pass:

Removes filler. “Um”, “uh”, “you know”, “like” - gone.
Fixes punctuation and grammar. Run-on speech becomes properly punctuated sentences.
Adjusts style. Casual speech can become a professional message, a formal note, or code-style text with the right identifier casing.
Translates. Speak in your native language, get the text in another.

So “um, so I think we should, you know, ship it on Friday” becomes “I think we should ship it on Friday.” Same meaning, ready to send.

This is why the category is called AI dictation rather than plain dictation: there are two models in the pipeline, and the second one is a language model. The speech model hears you; the language model edits you.

You said

so umm i pushed the fix to githab and the the latency droped on postgress

AI dictation gives you

So I pushed the fix to GitHub, and the latency dropped on Postgres.

Speech-to-text vs AI dictation, side by side

Axis	Plain speech-to-text	AI dictation
What it produces	Raw transcript of what you said	Cleaned-up text, ready to send
Models involved	One (speech-to-text)	Two (speech, then language model)
Filler words	Left in	Removed
Punctuation & grammar	Best-effort from the speech model	Corrected
Style / tone	Verbatim only	Casual to formal, or code style
Translation	No	Speak one language, get another
Where cleanup runs	N/A	Cloud in most apps; on-device in SnailText

The part most comparisons skip: where the second model runs

Here is the question that decides whether AI dictation is private: where does the language-model step happen?

In most AI dictation apps, the speech-to-text step may run on your device, but the cleanup step calls a cloud language model - OpenAI, Anthropic, or Google. That means your transcript is uploaded on every dictation, even when your audio never left the machine. “Local speech recognition” and “local AI dictation” are not the same claim. The first can be true while the second is false.

For a Slack message about lunch, that may not matter. For a commit message that quotes proprietary code, a legal note about a client, or a clinical observation, it matters a lot. The transcript is the sensitive part, and the cleanup step is exactly where it gets sent away.

How SnailText’s AI dictation works

SnailText runs both models on your device. Whisper (or Parakeet TDT) handles speech-to-text locally, in RAM. Then a local language model - a compact Gemma model running on your own hardware - does the cleanup pass. No API key, no cloud call, nothing uploaded at either stage. Here is what that second model actually does for you.

Cleanup and correction

Every dictation gets the basic editing pass: filler words dropped, punctuation and capitalization repaired, obvious grammar slips fixed, and known brand and product names restored to their proper casing (so “github” becomes “GitHub” and “postgres” becomes “Postgres”). This is the difference between a transcript you have to fix and a sentence you can send.

Topic profiles

Cleanup is not one-size-fits-all - a developer dictating code wants different handling than a novelist dictating prose. SnailText ships five topic profiles, and you pick the one that matches what you mostly dictate:

General - no topic bias, for dictations that cover many areas.
Development & IT - restores snake_case / camelCase identifiers and library names (Python, React, Docker, Postgres, and the like). The default for fresh installs.
Writing - articles, essays, prose. Preserves your voice and sentence rhythm, and skips identifier rewriting entirely so it never mangles a normal sentence into code.
Business - meetings, emails, project management. Knows KPI / OKR / ROI vocabulary and casts brand names correctly.
Academic - scientific writing, formula references, Latin species names, preserved technical terminology.

The profile is the single biggest lever on how the cleanup behaves, because it tells the language model what kind of text you are producing before it touches a word.

Identifier styles for code

If you dictate code, you can set the convention the model restores symbols into: snake_case, camelCase, kebab-case, PascalCase, or Auto (let the model infer from context). Say “recording completed” while the Development profile is active and the right style is set, and it comes out recording_completed rather than two plain words. This is the kind of thing that makes voice-driven coding actually usable instead of a constant cleanup chore.

Style, tone, and translation

The same model can shift register - turning a casual spoken sentence into a professional message - and translate: speak in your native language and get the text in another, processed locally rather than sent to a translation API.

You stay in control

The cleanup is deliberately conservative. It is tuned to preserve your meaning rather than rewrite it, and it leaves text alone when it is already clean. If you want the raw transcript with no editing, you turn the step off and get plain verbatim speech-to-text. AI dictation is a mode you switch on, not a filter you are stuck with.

This is also why we can call SnailText AI dictation honestly. Before the local language-model step shipped, it was a fast, private speech-to-text app. With two models in the pipeline - both on-device - it is AI dictation that uploads nothing.

The local language-model cleanup is a Pro feature, currently in beta. The free tier gives you the full local speech-to-text engine with no account and no word limit; Pro ($7.49/mo or $89/yr, up to 3 devices) adds the on-device cleanup model, the topic profiles, and the identifier styles described above.

When you want plain speech-to-text instead

AI dictation is not always the right mode. If you are transcribing a quote and need the exact words, or you are dictating into a system that has its own formatting rules, the cleanup step can get in the way. That is what the off switch is for. The point is not that one replaces the other - AI dictation gives you a second mode, and a good app lets you pick per task.

The short version: speech-to-text writes down what you said. AI dictation hands you what you meant to send - in the style your work needs. The only thing left to check is whether that second step keeps your words on your machine.

SnailText is offline voice dictation for Mac and Windows — local, private, free to start.

Download for Mac

Common questions

What is the difference between AI dictation and speech-to-text?

Speech-to-text is the underlying technology that turns audio into words. AI dictation is that, plus a second language-model step that cleans up the result - removing filler words, fixing punctuation and grammar, and optionally changing the style or translating. In short: speech-to-text gives you a raw transcript; AI dictation gives you text that is ready to send.

Does AI dictation send my voice to the cloud?

It depends on the app. The speech-to-text step may run locally, but the language-model cleanup step almost always runs in the cloud (GPT, Claude, or Gemini), so your transcript is uploaded even when the audio was processed on-device. SnailText runs both steps locally, so neither your audio nor your transcript leaves the machine.

Is AI dictation more accurate than regular dictation?

The raw accuracy comes from the speech-to-text model and is the same either way. What AI dictation improves is readability: the language-model pass fixes the punctuation, grammar, and stray filler that even an accurate transcript contains. It does not make the speech recognition itself more accurate.

Can AI dictation work offline?

Only if both models run on your device. Most AI dictation tools need an internet connection for the cloud language-model step. SnailText is built so the speech model and the language model both run locally, which means the full AI dictation flow works on a plane or with no signal.

What does the language model actually change in my text?

It removes disfluencies (um, uh, you know), repairs punctuation and capitalization, fixes obvious grammar slips, and - if you ask it to - rewrites the tone (casual to professional), formats code identifiers, or translates. It is meant to be a light cleanup, not a rewrite that changes your meaning.

Does the language model change what I actually said?

A good AI dictation cleanup is conservative: it tidies wording without inventing content. SnailText's local cleanup is tuned to preserve meaning and leaves the text alone when it is already clean. If you want the raw transcript with no cleanup, you can turn the step off and use plain speech-to-text.

Can I control how AI dictation edits my text?

Yes. SnailText has five topic profiles - General, Development & IT, Writing, Business, and Academic - that tell the language model what kind of text you are producing, so a developer dictating code and a writer dictating prose get different handling. For code you can also choose the identifier convention it restores symbols into (snake_case, camelCase, kebab-case, PascalCase, or Auto). And you can turn the whole cleanup off for verbatim output.

Can AI dictation format code identifiers when I speak?

Yes. With the Development profile active, SnailText restores spoken phrases into code identifiers in your chosen style - say 'recording completed' and get recording_completed (snake_case) or recordingCompleted (camelCase). This is what makes dictating into an editor or AI coding assistant practical rather than a constant cleanup chore.

Want AI dictation that never leaves your machine?

SnailText runs both the speech model and the language-model cleanup locally. No API key, no cloud round-trip, nothing uploaded. The free tier has unlimited local dictation with no account needed.

Download for Mac See SnailText AI dictation