claude-code - 💡(How to fix) Fix [FEATURE] Voice dictation: option to reduce prosody-driven punctuation insertion [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#54571Fetched 2026-04-30 06:42:01
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×3

Claude Code's voice dictation transcribes prosodic features (pauses, intonation) into syntactic punctuation such as periods, commas, and question marks, based on the model's interpretation of where sentence boundaries fall.

For users whose natural thinking-out-loud cadence includes long pauses within a thought rather than only between sentences, this consistently produces transcripts with punctuation the user did not intend. The dictation model effectively forces a speech pattern that mirrors written-sentence structure.

I'd like a setting that lets users opt out of (or dial down) prosody-driven punctuation, so transcripts more faithfully represent the spoken utterance and let the user, or Claude itself, handle punctuation downstream.

Root Cause

Claude Code's voice dictation transcribes prosodic features (pauses, intonation) into syntactic punctuation such as periods, commas, and question marks, based on the model's interpretation of where sentence boundaries fall.

For users whose natural thinking-out-loud cadence includes long pauses within a thought rather than only between sentences, this consistently produces transcripts with punctuation the user did not intend. The dictation model effectively forces a speech pattern that mirrors written-sentence structure.

I'd like a setting that lets users opt out of (or dial down) prosody-driven punctuation, so transcripts more faithfully represent the spoken utterance and let the user, or Claude itself, handle punctuation downstream.

RAW_BUFFERClick to expand / collapse

Summary

Claude Code's voice dictation transcribes prosodic features (pauses, intonation) into syntactic punctuation such as periods, commas, and question marks, based on the model's interpretation of where sentence boundaries fall.

For users whose natural thinking-out-loud cadence includes long pauses within a thought rather than only between sentences, this consistently produces transcripts with punctuation the user did not intend. The dictation model effectively forces a speech pattern that mirrors written-sentence structure.

I'd like a setting that lets users opt out of (or dial down) prosody-driven punctuation, so transcripts more faithfully represent the spoken utterance and let the user, or Claude itself, handle punctuation downstream.

Use case

When dictating prompts, my speech includes long pauses while I gather a thought. These pauses don't correspond to sentence boundaries. The transcription model interprets them as periods or commas, producing fragmented sentences that don't match what I said or meant.

I'm not asking for more accurate transcription in the word-recognition sense; word recognition is working fine. The specific concern is that prosody is being over-interpreted as syntax for users whose natural speech doesn't follow written-text rhythm.

Proposed options (any one would help)

  1. A voice.punctuation setting with values like:
    • "auto": current behavior.
    • "minimal": only insert punctuation when the model has high confidence (for example, clear interrogative intonation for ?).
    • "none": emit raw text and let the user (or downstream model) handle punctuation.
  2. A way to opt out of pause-driven punctuation specifically, while keeping intonation-driven punctuation, so question marks still work but spurious periods and commas don't appear.
  3. Documented guidance on speech patterns that minimize prosody-driven punctuation, if a setting isn't feasible in the near term.

Option 1 is the most flexible; option 2 captures the core concern; option 3 is a no-cost stopgap.

Why this isn't a duplicate

Existing voice-related issues touch adjacent concerns, but none address prosody-as-punctuation specifically.

  • #35081 (closed as stale, locked). Framed as general "punctuation errors and reduced accuracy compared to competitors." Asked for the default model to be better, not for user-facing controls. The stale-bot's closing comment explicitly invites re-filing. This issue is specifically about giving users control when their speech pattern doesn't fit the model's prosody assumptions, which is a different ask than "improve quality."
  • #41654 (open, stale). Custom vocabulary, codebase-aware vocab hints, STT engine selection, microphone selection. Focused on word recognition and vocabulary, not punctuation.
  • #47392 (open). Tag dictated messages so Claude knows the input is transcription. An orthogonal approach (interpret around bad transcription rather than fix it). Complementary, not overlapping.
  • #40379 (closed as duplicate). Required unnaturally slow speech to get usable transcripts. About recognition robustness at conversational pace, not punctuation insertion.
  • #3412 (open). Edit pasted-text blocks before submission. Adjacent for tap-mode workflows, but doesn't address the underlying transcription behavior.

Concrete example

The following is a real dictation captured in tap mode while reviewing a design with the assistant. The first block is exactly what the transcription model produced. The second is what I actually said and would have typed.

As transcribed:

Location drop should be strict. For the three but I do like having the other option with a free text. Fallback. Let's table the free text option for now. We'll add it if we need it. Contact does not have to be acquired For number five, I like option a. I think that's simpler. And for number six, I I like your recommendation. Further questions?

As intended:

Location drop should be strict for the three, but I do like having the other option with a free text fallback. Let's table the free text option for now. We'll add it if we need it. Contact does not have to be required for number five. I like option A, I think that's simpler. And for number six, I like your recommendation. Any further questions?

The prosody-driven errors:

  • "strict. For the three but" should be "strict for the three, but". A pause was rendered as a period, and the actual comma boundary was missed.
  • "free text. Fallback." should be "free text fallback.". A pause inside a noun phrase was rendered as a sentence break.
  • "acquired For number five" should be "required for number five.". The capitalized word after a non-period suggests the model hesitated about the sentence boundary, then split it inconsistently. (Note: acquired vs required is a separate word-recognition issue, not the focus of this report; covered by #41654.)

In every case, the model interpreted a thinking-pause as a sentence boundary. None of these were sentence breaks in my actual speech.

Environment

  • Claude Code v2.1.123
  • Linux (CachyOS)
  • voice.mode: "tap", language: en
  • Authenticated with Claude.ai account

extent analysis

TL;DR

Implementing a voice.punctuation setting with options like "auto", "minimal", or "none" could help mitigate prosody-driven punctuation issues.

Guidance

  • Consider adding a user-facing setting to control prosody-driven punctuation, allowing users to opt-out or dial down this feature.
  • Evaluate the feasibility of implementing a "minimal" punctuation mode, where the model only inserts punctuation when it has high confidence.
  • Investigate the possibility of documenting speech patterns that minimize prosody-driven punctuation as a temporary workaround.
  • Review the existing codebase to determine the best approach for integrating a voice.punctuation setting, considering factors like model confidence thresholds and user interface updates.

Example

No code snippet is provided, as the issue focuses on feature implementation rather than code correction.

Notes

The proposed solution relies on the assumption that the transcription model can be modified to accommodate user-controlled punctuation settings. The effectiveness of this approach may depend on the model's architecture and the complexity of implementing such a feature.

Recommendation

Apply a workaround by documenting speech patterns that minimize prosody-driven punctuation, as this is a no-cost stopgap that can provide immediate relief to users while a more permanent solution is developed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [FEATURE] Voice dictation: option to reduce prosody-driven punctuation insertion [1 participants]