openclaw - 💡(How to fix) Fix feat(typing): show typing indicator immediately on voice message receipt, before STT transcription [1 comments, 1 participants]

richardhxwang · 2026-04-01T09:00:17Z

[openclaw] - Channel: Telegram likely affects WhatsApp, WeChat, and other channels with voice support - typingMode: "instant" is already configured but has no… - Channel: Telegram (likely affects WhatsApp, WeChat, and other channels with voice support) - `typingMode: "instant"` is already configured but has no effect on voice messages because the agent loop doesn't start until after transcription - Related: #39052 (parallelize audio preflight), #39075 (optimize Telegram pipeline) ## Fix / Workaround In the inbound voice message handler, send `sendChatAction(typing)` immediately upon receiving the message, before dispatching to the STT pipeline. This is purely a UX change with no functional impact. ## Problem When a user sends a voice message, the typing indicator only appears **after** STT transcription completes — not when the message is received. This creates a silent wait of 3–6 seconds before any feedback is shown: | Step | Time | |------|------| | Telegram polling + message received | ~1s | | Audio file download from Telegram CDN | ~2–3s | | STT transcription (e.g. OpenAI gpt-4o-mini-transcribe) | ~1–2s | | **Typing indicator finally appears** | **~4–6s after send** | For text messages, typing shows in ~1s. The voice UX feels broken by comparison. ## Expected Behavior Typing indicator should fire **as soon as the voice message is received**, before the transcription pipeline starts — similar to how `typingMode: "instant"` works for text messages. ## Suggested Implementation In the inbound voice message handler, send `sendChatAction(typing)` immediately upon receiving the message, before dispatching to the STT pipeline. This is purely a UX change with no functional impact. ## Context - Channel: Telegram (likely affects WhatsApp, WeChat, and other channels with voice support) - `typingMode: "instant"` is already configured but has no effect on voice messages because the agent loop doesn't start until after transcription - Related: #39052 (parallelize audio preflight), #39075 (optimize Telegram pipeline)

openclaw2026-04-01 09:00:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#58887•Fetched 2026-04-08 02:31:27

View on GitHub

Comments

Participants

Timeline

Reactions

Author

richardhxwang

Participants

richardhxwang

Channel: Telegram (likely affects WhatsApp, WeChat, and other channels with voice support)
typingMode: "instant" is already configured but has no effect on voice messages because the agent loop doesn't start until after transcription
Related: #39052 (parallelize audio preflight), #39075 (optimize Telegram pipeline)

Root Cause

Channel: Telegram (likely affects WhatsApp, WeChat, and other channels with voice support)
typingMode: "instant" is already configured but has no effect on voice messages because the agent loop doesn't start until after transcription
Related: #39052 (parallelize audio preflight), #39075 (optimize Telegram pipeline)

Fix Action

Fix / Workaround

In the inbound voice message handler, send sendChatAction(typing) immediately upon receiving the message, before dispatching to the STT pipeline. This is purely a UX change with no functional impact.

RAW_BUFFERClick to expand / collapse

Problem

When a user sends a voice message, the typing indicator only appears after STT transcription completes — not when the message is received. This creates a silent wait of 3–6 seconds before any feedback is shown:

Step	Time
Telegram polling + message received	~1s
Audio file download from Telegram CDN	~2–3s
STT transcription (e.g. OpenAI gpt-4o-mini-transcribe)	~1–2s
Typing indicator finally appears	~4–6s after send

For text messages, typing shows in ~1s. The voice UX feels broken by comparison.

Expected Behavior

Typing indicator should fire as soon as the voice message is received, before the transcription pipeline starts — similar to how typingMode: "instant" works for text messages.

Suggested Implementation

Context

Channel: Telegram (likely affects WhatsApp, WeChat, and other channels with voice support)
typingMode: "instant" is already configured but has no effect on voice messages because the agent loop doesn't start until after transcription
Related: #39052 (parallelize audio preflight), #39075 (optimize Telegram pipeline)

extent analysis

TL;DR

Send the typing indicator immediately upon receiving the voice message, before starting the STT transcription pipeline.

Guidance

Modify the inbound voice message handler to send sendChatAction(typing) as soon as the message is received, without waiting for transcription to complete.
Verify that the typing indicator appears promptly after sending a voice message, ideally within 1-2 seconds.
Consider reviewing related issues #39052 and #39075 for potential optimizations to the audio preflight and Telegram pipeline.
Test the change on different channels, such as WhatsApp and WeChat, to ensure the fix applies broadly.

Example

// Inbound voice message handler
function handleVoiceMessage(message) {
  // Send typing indicator immediately
  sendChatAction('typing');
  
  // Dispatch to STT pipeline
  transcribeVoiceMessage(message);
}

Notes

This fix assumes that the sendChatAction function is available and correctly implemented for the Telegram channel. Additionally, the effectiveness of this change may depend on the specific STT transcription pipeline and its performance characteristics.

Recommendation

Apply the suggested implementation workaround, as it addresses the UX issue without requiring functional changes to the STT pipeline or other components. This approach should provide a more responsive typing indicator for voice messages, aligning with the expected behavior for text messages.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#latency issue #model loading #dependency error #configuration error #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix feat(typing): show typing indicator immediately on voice message receipt, before STT transcription [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Problem

Expected Behavior

Suggested Implementation

Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix feat(typing): show typing indicator immediately on voice message receipt, before STT transcription [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Problem

Expected Behavior

Suggested Implementation

Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING