openclaw - 💡(How to fix) Fix Telegram audio replies can use stale/incorrect content even when transcript is correct [1 participants]

openclaw2026-04-22 08:33:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#70064•Fetched 2026-04-23 07:29:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

albertocano-ui

Participants

albertocano-ui

In Telegram direct chat, OpenClaw sometimes responds to a new audio message using content from a previous audio or otherwise produces a reply that does not reflect the current audio transcript.

After extensive debugging, the key finding is:

the inbound audio file is correct
the transcript at message:preprocessed is correct
the problem appears later in the reply pipeline, where the model-facing turn or response logic becomes contaminated or stale

Root Cause

In Telegram direct chat, OpenClaw sometimes responds to a new audio message using content from a previous audio or otherwise produces a reply that does not reflect the current audio transcript.

After extensive debugging, the key finding is:

the inbound audio file is correct
the transcript at message:preprocessed is correct
the problem appears later in the reply pipeline, where the model-facing turn or response logic becomes contaminated or stale

Fix Action

Fix / Workaround

Local workaround status

No robust local workaround was found.

Code Example

"network": {
  "dangerouslyAllowPrivateNetwork": true
}

RAW_BUFFERClick to expand / collapse

Bug report: Telegram audio turns can reply using stale/incorrect content even when transcript is correct

Summary

In Telegram direct chat, OpenClaw sometimes responds to a new audio message using content from a previous audio or otherwise produces a reply that does not reflect the current audio transcript.

After extensive debugging, the key finding is:

the inbound audio file is correct
the transcript at message:preprocessed is correct
the problem appears later in the reply pipeline, where the model-facing turn or response logic becomes contaminated or stale

Environment

OpenClaw before update: 2026.4.15 (041266a)
OpenClaw after update: 2026.4.21 (f788c88)
Channel: Telegram direct chat
Surface/provider: telegram
Host OS: Linux
Local STT endpoint in use previously: http://127.0.0.1:8765/v1/audio/transcriptions

Original symptoms

Sending a new audio could lead to a reply that reflected the previous audio.
Later, the behavior narrowed to this: OpenClaw had the correct current transcript, but the assistant reply still did not faithfully reflect that transcript and could appear stale or contaminated by prior turn context.

Important findings

1) The audio file itself is correct

Manual transcription of specific Telegram audio files via a local script confirmed each file contained the expected new utterance.

Examples from manual file transcription during debugging:

one file transcribed as: Ok, esto es una prueba del número 9
next file transcribed as: te envío una prueba número 10

This showed the audio files arriving from Telegram were not stale.

2) STT script and local Whisper work correctly

A local transcription script correctly transcribed different audio files independently.

Script used:

/home/acanoa/.openclaw/workspace/scripts/transcribe_telegram_audio.py

This ruled out Telegram delivery and the local STT script as the primary cause.

3) On older setup, internal media-understanding hit SSRF blocking

Logs showed OpenClaw attempting:

http://127.0.0.1:8765/v1/audio/transcriptions

and failing with:

SsrFBlockedError
blocked URL fetch
Blocked hostname or private/internal/special-use IP address

Enabling:

"network": {
  "dangerouslyAllowPrivateNetwork": true
}

removed that blockage and fixed one earlier failure mode.

4) Even after update and hook instrumentation, transcript is still correct at preprocessed stage

After updating to 2026.4.21, debug hooks showed that in message:preprocessed for Telegram audio, the context contains the correct data:

mediaPath
mediaType
transcript
body

Observed examples from logs:

transcript: "prueba número 27"
body: "prueba número 27"
mediaPath: "/home/acanoa/.openclaw/media/inbound/file_81---7b73d349-2d9f-434a-a686-df58460e609e.ogg"

and:

transcript: "Prueba número 27"
body: "Prueba número 27"
mediaPath: "/home/acanoa/.openclaw/media/inbound/file_82---e6016ca3-0f67-419b-b234-09956eb9b13c.ogg"

This strongly suggests the bug happens after message:preprocessed, not in transcription.

Hook experiments performed

The following workspace internal hooks were loaded successfully:

hooks/telegram-audio-bridge.js
hooks/telegram-audio-inspect.js
hooks/telegram-audio-force-transcript.js

The gateway logs confirmed loading, e.g.:

Loading legacy internal hook module from workspace path hooks/telegram-audio-bridge.js
Loading legacy internal hook module from workspace path hooks/telegram-audio-inspect.js
Loading legacy internal hook module from workspace path hooks/telegram-audio-force-transcript.js

Hook purposes

telegram-audio-bridge.js
- attempted to transcribe Telegram audio from mediaPath
- overwrite body, bodyForAgent, content, transcript
telegram-audio-inspect.js
- logged actual hook context to identify where audio and transcript appear
telegram-audio-force-transcript.js
- ran on message:transcribed
- attempted to force the current transcript into body, bodyForAgent, content

Result of hook experiments

Even with correct transcript/body visible in logs during message:preprocessed, user-visible behavior still remained unreliable. So the issue likely lies deeper in the downstream reply pipeline, model-input construction, or some later turn/body rewriting stage.

Current conclusion

This appears to be an OpenClaw pipeline bug affecting Telegram audio turn handling, where the final assistant response can diverge from the current audio transcript despite correct upstream transcription and correct hook-visible body/transcript values.

Expected behavior

When a Telegram audio message is received and transcribed, the assistant reply should reflect the current audio transcript only.

Actual behavior

The assistant can respond with content reflecting:

a previous audio
stale turn content
or a contaminated interpretation not matching the current transcript

even though the current transcript is correct in the hook-visible preprocessed context.

Reproduction outline

Use Telegram direct chat with OpenClaw.
Send a short text message.
Send an audio message with clearly different content.
Observe that hook logs show the correct current transcript/body.
Assistant response may still not reflect that current transcript reliably.

Suggested investigation areas for OpenClaw maintainers

downstream mutation after message:preprocessed
divergence between hook context and actual model input
stale turn/body reuse for Telegram audio sessions
reply construction path after message:transcribed
any hidden media-understanding / enriched-body / canonical-body overwrite after hooks run

Local workaround status

No robust local workaround was found.

Things that were tried:

allowing private network to unblock local STT endpoint
disabling echoTranscript
disabling internal audio path and experimenting with alternative config
forcing transcript via hooks at message:preprocessed
forcing transcript via hooks at message:transcribed
upgrading OpenClaw from 2026.4.15 to 2026.4.21

None of these fully resolved the final user-visible reply contamination.

extent analysis

TL;DR

The most likely fix for the issue of OpenClaw responding with stale or incorrect content to Telegram audio messages is to investigate and resolve the divergence between the hook context and the actual model input after the message:preprocessed stage.

Guidance

Investigate downstream mutation: After the message:preprocessed stage, inspect the code for any mutations or overwrites of the transcript, body, or other relevant fields that could cause the model input to diverge from the correct transcript.
Verify model input construction: Check how the model input is constructed after the message:transcribed stage and ensure it accurately reflects the current transcript.
Check for stale turn/body reuse: Investigate if there's any reuse of previous turn or body content in the reply construction path that could lead to stale or contaminated responses.
Inspect media-understanding and enriched-body logic: Review the media-understanding and enriched-body logic to ensure it's not overwriting the correct transcript or body after the hooks have run.

Example

No specific code example can be provided without more context, but the investigation should focus on the areas mentioned in the guidance section.

Notes

The issue seems to be specific to the OpenClaw pipeline and its handling of Telegram audio messages. The fact that the transcript is correct at the message:preprocessed stage but the final response is not, suggests a problem in the downstream processing.

Recommendation

Apply a workaround by forcing the correct transcript into the model input at the message:transcribed stage, similar to what was attempted with the telegram-audio-force-transcript.js hook, but with a more targeted approach to ensure the model input accurately reflects the current transcript. This should help mitigate the issue until a permanent fix can be implemented.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

When a Telegram audio message is received and transcribed, the assistant reply should reflect the current audio transcript only.

#ssr #serialization error #model compatibility #GPU setup #container setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix Telegram audio replies can use stale/incorrect content even when transcript is correct [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Local workaround status

Code Example

Bug report: Telegram audio turns can reply using stale/incorrect content even when transcript is correct

Summary

Environment

Original symptoms

Important findings

1) The audio file itself is correct

2) STT script and local Whisper work correctly

3) On older setup, internal media-understanding hit SSRF blocking

4) Even after update and hook instrumentation, transcript is still correct at preprocessed stage

Hook experiments performed

Hook purposes

Result of hook experiments

Current conclusion

Expected behavior

Actual behavior

Reproduction outline

Suggested investigation areas for OpenClaw maintainers

Local workaround status

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING