claude-code - 💡(How to fix) Fix [FEATURE] Voice dictation: option to reduce prosody-driven punctuation insertion [1 participants]

Claude Code's voice dictation transcribes prosodic features (pauses, intonation) into syntactic punctuation such as periods, commas, and question marks, based on the model's interpretation of where sentence boundaries fall.

For users whose natural thinking-out-loud cadence includes long pauses within a thought rather than only between sentences, this consistently produces transcripts with punctuation the user did not intend. The dictation model effectively forces a speech pattern that mirrors written-sentence structure.

I'd like a setting that lets users opt out of (or dial down) prosody-driven punctuation, so transcripts more faithfully represent the spoken utterance and let the user, or Claude itself, handle punctuation downstream.

Root Cause

Summary

Use case

When dictating prompts, my speech includes long pauses while I gather a thought. These pauses don't correspond to sentence boundaries. The transcription model interprets them as periods or commas, producing fragmented sentences that don't match what I said or meant.

I'm not asking for more accurate transcription in the word-recognition sense; word recognition is working fine. The specific concern is that prosody is being over-interpreted as syntax for users whose natural speech doesn't follow written-text rhythm.

Proposed options (any one would help)

A voice.punctuation setting with values like:
- "auto": current behavior.
- "minimal": only insert punctuation when the model has high confidence (for example, clear interrogative intonation for ?).
- "none": emit raw text and let the user (or downstream model) handle punctuation.
A way to opt out of pause-driven punctuation specifically, while keeping intonation-driven punctuation, so question marks still work but spurious periods and commas don't appear.
Documented guidance on speech patterns that minimize prosody-driven punctuation, if a setting isn't feasible in the near term.

Option 1 is the most flexible; option 2 captures the core concern; option 3 is a no-cost stopgap.

Why this isn't a duplicate

Existing voice-related issues touch adjacent concerns, but none address prosody-as-punctuation specifically.

#35081 (closed as stale, locked). Framed as general "punctuation errors and reduced accuracy compared to competitors." Asked for the default model to be better, not for user-facing controls. The stale-bot's closing comment explicitly invites re-filing. This issue is specifically about giving users control when their speech pattern doesn't fit the model's prosody assumptions, which is a different ask than "improve quality."
#41654 (open, stale). Custom vocabulary, codebase-aware vocab hints, STT engine selection, microphone selection. Focused on word recognition and vocabulary, not punctuation.
#47392 (open). Tag dictated messages so Claude knows the input is transcription. An orthogonal approach (interpret around bad transcription rather than fix it). Complementary, not overlapping.
#40379 (closed as duplicate). Required unnaturally slow speech to get usable transcripts. About recognition robustness at conversational pace, not punctuation insertion.
#3412 (open). Edit pasted-text blocks before submission. Adjacent for tap-mode workflows, but doesn't address the underlying transcription behavior.

Concrete example

The following is a real dictation captured in tap mode while reviewing a design with the assistant. The first block is exactly what the transcription model produced. The second is what I actually said and would have typed.

As transcribed:

Location drop should be strict. For the three but I do like having the other option with a free text. Fallback. Let's table the free text option for now. We'll add it if we need it. Contact does not have to be acquired For number five, I like option a. I think that's simpler. And for number six, I I like your recommendation. Further questions?

As intended:

Location drop should be strict for the three, but I do like having the other option with a free text fallback. Let's table the free text option for now. We'll add it if we need it. Contact does not have to be required for number five. I like option A, I think that's simpler. And for number six, I like your recommendation. Any further questions?

The prosody-driven errors:

"strict. For the three but" should be "strict for the three, but". A pause was rendered as a period, and the actual comma boundary was missed.
"free text. Fallback." should be "free text fallback.". A pause inside a noun phrase was rendered as a sentence break.
"acquired For number five" should be "required for number five.". The capitalized word after a non-period suggests the model hesitated about the sentence boundary, then split it inconsistently. (Note: acquired vs required is a separate word-recognition issue, not the focus of this report; covered by #41654.)

In every case, the model interpreted a thinking-pause as a sentence boundary. None of these were sentence breaks in my actual speech.

Environment

Claude Code v2.1.123
Linux (CachyOS)
voice.mode: "tap", language: en
Authenticated with Claude.ai account

extent analysis

TL;DR

Implementing a voice.punctuation setting with options like "auto", "minimal", or "none" could help mitigate prosody-driven punctuation issues.

Guidance

Consider adding a user-facing setting to control prosody-driven punctuation, allowing users to opt-out or dial down this feature.
Evaluate the feasibility of implementing a "minimal" punctuation mode, where the model only inserts punctuation when it has high confidence.
Investigate the possibility of documenting speech patterns that minimize prosody-driven punctuation as a temporary workaround.
Review the existing codebase to determine the best approach for integrating a voice.punctuation setting, considering factors like model confidence thresholds and user interface updates.

Example

No code snippet is provided, as the issue focuses on feature implementation rather than code correction.

Notes

The proposed solution relies on the assumption that the transcription model can be modified to accommodate user-controlled punctuation settings. The effectiveness of this approach may depend on the model's architecture and the complexity of implementing such a feature.

Recommendation

Apply a workaround by documenting speech patterns that minimize prosody-driven punctuation, as this is a no-cost stopgap that can provide immediate relief to users while a more permanent solution is developed.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [FEATURE] Voice dictation: option to reduce prosody-driven punctuation insertion [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Use case

Proposed options (any one would help)

Why this isn't a duplicate

Concrete example

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [FEATURE] Voice dictation: option to reduce prosody-driven punctuation insertion [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Use case

Proposed options (any one would help)

Why this isn't a duplicate

Concrete example

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING