openclaw - 💡(How to fix) Fix Telegram audio replies can use stale/incorrect content even when transcript is correct [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70064Fetched 2026-04-23 07:29:42
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0

In Telegram direct chat, OpenClaw sometimes responds to a new audio message using content from a previous audio or otherwise produces a reply that does not reflect the current audio transcript.

After extensive debugging, the key finding is:

  • the inbound audio file is correct
  • the transcript at message:preprocessed is correct
  • the problem appears later in the reply pipeline, where the model-facing turn or response logic becomes contaminated or stale

Root Cause

In Telegram direct chat, OpenClaw sometimes responds to a new audio message using content from a previous audio or otherwise produces a reply that does not reflect the current audio transcript.

After extensive debugging, the key finding is:

  • the inbound audio file is correct
  • the transcript at message:preprocessed is correct
  • the problem appears later in the reply pipeline, where the model-facing turn or response logic becomes contaminated or stale

Fix Action

Fix / Workaround

Local workaround status

No robust local workaround was found.

Code Example

"network": {
  "dangerouslyAllowPrivateNetwork": true
}
RAW_BUFFERClick to expand / collapse

Bug report: Telegram audio turns can reply using stale/incorrect content even when transcript is correct

Summary

In Telegram direct chat, OpenClaw sometimes responds to a new audio message using content from a previous audio or otherwise produces a reply that does not reflect the current audio transcript.

After extensive debugging, the key finding is:

  • the inbound audio file is correct
  • the transcript at message:preprocessed is correct
  • the problem appears later in the reply pipeline, where the model-facing turn or response logic becomes contaminated or stale

Environment

  • OpenClaw before update: 2026.4.15 (041266a)
  • OpenClaw after update: 2026.4.21 (f788c88)
  • Channel: Telegram direct chat
  • Surface/provider: telegram
  • Host OS: Linux
  • Local STT endpoint in use previously: http://127.0.0.1:8765/v1/audio/transcriptions

Original symptoms

  1. Sending a new audio could lead to a reply that reflected the previous audio.
  2. Later, the behavior narrowed to this: OpenClaw had the correct current transcript, but the assistant reply still did not faithfully reflect that transcript and could appear stale or contaminated by prior turn context.

Important findings

1) The audio file itself is correct

Manual transcription of specific Telegram audio files via a local script confirmed each file contained the expected new utterance.

Examples from manual file transcription during debugging:

  • one file transcribed as: Ok, esto es una prueba del número 9
  • next file transcribed as: te envío una prueba número 10

This showed the audio files arriving from Telegram were not stale.

2) STT script and local Whisper work correctly

A local transcription script correctly transcribed different audio files independently.

Script used:

  • /home/acanoa/.openclaw/workspace/scripts/transcribe_telegram_audio.py

This ruled out Telegram delivery and the local STT script as the primary cause.

3) On older setup, internal media-understanding hit SSRF blocking

Logs showed OpenClaw attempting:

  • http://127.0.0.1:8765/v1/audio/transcriptions

and failing with:

  • SsrFBlockedError
  • blocked URL fetch
  • Blocked hostname or private/internal/special-use IP address

Enabling:

"network": {
  "dangerouslyAllowPrivateNetwork": true
}

removed that blockage and fixed one earlier failure mode.

4) Even after update and hook instrumentation, transcript is still correct at preprocessed stage

After updating to 2026.4.21, debug hooks showed that in message:preprocessed for Telegram audio, the context contains the correct data:

  • mediaPath
  • mediaType
  • transcript
  • body

Observed examples from logs:

  • transcript: "prueba número 27"
  • body: "prueba número 27"
  • mediaPath: "/home/acanoa/.openclaw/media/inbound/file_81---7b73d349-2d9f-434a-a686-df58460e609e.ogg"

and:

  • transcript: "Prueba número 27"
  • body: "Prueba número 27"
  • mediaPath: "/home/acanoa/.openclaw/media/inbound/file_82---e6016ca3-0f67-419b-b234-09956eb9b13c.ogg"

This strongly suggests the bug happens after message:preprocessed, not in transcription.

Hook experiments performed

The following workspace internal hooks were loaded successfully:

  • hooks/telegram-audio-bridge.js
  • hooks/telegram-audio-inspect.js
  • hooks/telegram-audio-force-transcript.js

The gateway logs confirmed loading, e.g.:

  • Loading legacy internal hook module from workspace path hooks/telegram-audio-bridge.js
  • Loading legacy internal hook module from workspace path hooks/telegram-audio-inspect.js
  • Loading legacy internal hook module from workspace path hooks/telegram-audio-force-transcript.js

Hook purposes

  1. telegram-audio-bridge.js

    • attempted to transcribe Telegram audio from mediaPath
    • overwrite body, bodyForAgent, content, transcript
  2. telegram-audio-inspect.js

    • logged actual hook context to identify where audio and transcript appear
  3. telegram-audio-force-transcript.js

    • ran on message:transcribed
    • attempted to force the current transcript into body, bodyForAgent, content

Result of hook experiments

Even with correct transcript/body visible in logs during message:preprocessed, user-visible behavior still remained unreliable. So the issue likely lies deeper in the downstream reply pipeline, model-input construction, or some later turn/body rewriting stage.

Current conclusion

This appears to be an OpenClaw pipeline bug affecting Telegram audio turn handling, where the final assistant response can diverge from the current audio transcript despite correct upstream transcription and correct hook-visible body/transcript values.

Expected behavior

When a Telegram audio message is received and transcribed, the assistant reply should reflect the current audio transcript only.

Actual behavior

The assistant can respond with content reflecting:

  • a previous audio
  • stale turn content
  • or a contaminated interpretation not matching the current transcript

even though the current transcript is correct in the hook-visible preprocessed context.

Reproduction outline

  1. Use Telegram direct chat with OpenClaw.
  2. Send a short text message.
  3. Send an audio message with clearly different content.
  4. Observe that hook logs show the correct current transcript/body.
  5. Assistant response may still not reflect that current transcript reliably.

Suggested investigation areas for OpenClaw maintainers

  • downstream mutation after message:preprocessed
  • divergence between hook context and actual model input
  • stale turn/body reuse for Telegram audio sessions
  • reply construction path after message:transcribed
  • any hidden media-understanding / enriched-body / canonical-body overwrite after hooks run

Local workaround status

No robust local workaround was found.

Things that were tried:

  • allowing private network to unblock local STT endpoint
  • disabling echoTranscript
  • disabling internal audio path and experimenting with alternative config
  • forcing transcript via hooks at message:preprocessed
  • forcing transcript via hooks at message:transcribed
  • upgrading OpenClaw from 2026.4.15 to 2026.4.21

None of these fully resolved the final user-visible reply contamination.

extent analysis

TL;DR

The most likely fix for the issue of OpenClaw responding with stale or incorrect content to Telegram audio messages is to investigate and resolve the divergence between the hook context and the actual model input after the message:preprocessed stage.

Guidance

  1. Investigate downstream mutation: After the message:preprocessed stage, inspect the code for any mutations or overwrites of the transcript, body, or other relevant fields that could cause the model input to diverge from the correct transcript.
  2. Verify model input construction: Check how the model input is constructed after the message:transcribed stage and ensure it accurately reflects the current transcript.
  3. Check for stale turn/body reuse: Investigate if there's any reuse of previous turn or body content in the reply construction path that could lead to stale or contaminated responses.
  4. Inspect media-understanding and enriched-body logic: Review the media-understanding and enriched-body logic to ensure it's not overwriting the correct transcript or body after the hooks have run.

Example

No specific code example can be provided without more context, but the investigation should focus on the areas mentioned in the guidance section.

Notes

The issue seems to be specific to the OpenClaw pipeline and its handling of Telegram audio messages. The fact that the transcript is correct at the message:preprocessed stage but the final response is not, suggests a problem in the downstream processing.

Recommendation

Apply a workaround by forcing the correct transcript into the model input at the message:transcribed stage, similar to what was attempted with the telegram-audio-force-transcript.js hook, but with a more targeted approach to ensure the model input accurately reflects the current transcript. This should help mitigate the issue until a permanent fix can be implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When a Telegram audio message is received and transcribed, the assistant reply should reflect the current audio transcript only.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Telegram audio replies can use stale/incorrect content even when transcript is correct [1 participants]