openclaw - 💡(How to fix) Fix [Bug]: Telegram voice messages not transcribed via Groq Whisper after multi-agent config (Docker) [1 participants]

danielsgar · 2026-04-07T00:41:18Z

[openclaw] Telegram voice messages .ogg are received and saved to media/inbound/ correctly, but Groq Whisper transcription is never triggered. Agent responds a… Telegram voice messages (.ogg) are received and saved to media/inbound/ correctly, but Groq Whisper transcription is never triggered. Agent responds asking user to send text instead. No audio-related logs appear even with OPENCLAW_DIAGNOSTICS=telegram.*,media.* enabled. ## Fix / Workaround High — core feature (voice message transcription) completely non-functional in Docker multi-agent deployment. Workaround is manual text input only. ### Bug type Regression (worked before, now fails) ### Beta release blocker No ### Summary Telegram voice messages (.ogg) are received and saved to media/inbound/ correctly, but Groq Whisper transcription is never triggered. Agent responds asking user to send text instead. No audio-related logs appear even with OPENCLAW_DIAGNOSTICS=telegram.*,media.* enabled. ### Steps to reproduce 1. Deploy OpenClaw via Docker (ghcr.io/openclaw/openclaw:latest) 2. Configure multiple agents via agents.list (main, iris, sigma) 3. Enable Groq Whisper: tools.media.audio.enabled=true, models=[{provider:groq, model:whisper-large-v3}] 4. Set echoTranscript=true and scope.default=allow 5. Connect Telegram bot to main agent 6. Send a voice message via Telegram 7. Agent responds with "I can't process audio" — no transcription attempt logged ### Expected behavior Voice message should be transcribed by Groq Whisper and the transcript should be passed to the agent as text. With echoTranscript=true, a confirmation of the transcript should appear in Telegram before the agent response. ### Actual behavior Agent receives the voice message (.ogg file saved to media/inbound/) but transcription is never triggered. No audio or media-related logs appear. Agent responds asking user to send text instead. echoTranscript produces no output. ### OpenClaw version 2026.4.2 (unknown) ### Operating system Ubuntu 24.04 (Hetzner CX23) ### Install method Docker ### Model ghcr.io/openclaw/openclaw:latest (version 2026.4.2) ### Provider / routing chain Anthropic / claude-haiku-4-5 → Telegram → main agent (default) Audio provider: Groq Whisper (whisper-large-v3) via GROQ_API_KEY env var ### Additional provider/model setup details - 3 agents configured in agents.list: main (default), iris, sigma - Groq API accessible from container (curl to api.groq.com returns 200) - GROQ_API_KEY loaded correctly (openclaw models status shows groq as available: env=gsk_...) - tools.media.audio config confirmed correct via openclaw config get tools.media.audio - .ogg files arrive in media/inbound/ confirming Telegram delivery works - OPENCLAW_DIAGNOSTICS=telegram.*,media.* produces zero audio-related logs when voice message is sent - applyMediaUnderstanding never appears in logs - Issue appeared after Docker migration and multi-agent configuration - Single agent setup (before adding iris and sigma) may have worked ### Logs, screenshots, and evidence ```shell Evidence collected during diagnosis: 1. .ogg files confirmed arriving in media/inbound/: file_27---931e81c1.ogg (Apr 6 01:51) file_26---e7982d2f.ogg (Apr 6 01:48) 2. Groq API accessible from container: $ curl -H "Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/models → 200 OK $ curl .ogg file to Groq transcriptions endpoint → 200 OK 3. openclaw models status shows Groq available: - groq effective=env:gsk_KFst...MZRDeGxZ | source=env: GROQ_API_KEY 4. tools.media.audio config: { "enabled": true, "echoTranscript": true, "scope": { "default": "allow" }, "models": [{ "provider": "groq", "model": "whisper-large-v3" }] } 5. OPENCLAW_DIAGNOSTICS=telegram.*,media.* — zero audio-related logs when voice sent 6. docker compose logs after voice message sent — no entries generated 7. Agent response to voice message: "Não consigo processar áudio. Se quiser que eu responda algo, manda em texto." Related issues: #7899, #7460 ``` ### Impact and severity High — core feature (voice message transcription) completely non-functional in Docker multi-agent deployment. Workaround is manual text input only. ### Additional information - Deployment: Docker (ghcr.io/openclaw/openclaw:latest), Ubuntu 24.04, Hetzner CX23 - Transcription worked before Docker migration (npm install -g openclaw, version 2026.3.24) - Issue persists after: docker compose down/up, gateway restart, config reset, scope.default=allow addition - OPENCLAW_DIAGNOSTICS confirms zero media pipeline activity — applyMediaUnderstanding not being called - Single agent config (before adding iris/sigma via agents.list) may not reproduce the issue - Note: audio was never tested in Docker with single agent — unclear if issue is Docker-specific or multi-agent-specific

openclaw2026-04-07 00:41:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#62209•Fetched 2026-04-08 03:07:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

danielsgar

Participants

danielsgar

Timeline (top)

labeled ×2

Telegram voice messages (.ogg) are received and saved to media/inbound/ correctly, but Groq Whisper transcription is never triggered. Agent responds asking user to send text instead. No audio-related logs appear even with OPENCLAW_DIAGNOSTICS=telegram.,media. enabled.

Root Cause

Fix Action

Fix / Workaround

High — core feature (voice message transcription) completely non-functional in Docker multi-agent deployment. Workaround is manual text input only.

Code Example

Evidence collected during diagnosis:

1. .ogg files confirmed arriving in media/inbound/:
file_27---931e81c1.ogg (Apr 6 01:51)
file_26---e7982d2f.ogg (Apr 6 01:48)

2. Groq API accessible from container:
$ curl -H "Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/models → 200 OK
$ curl .ogg file to Groq transcriptions endpoint → 200 OK

3. openclaw models status shows Groq available:
- groq effective=env:gsk_KFst...MZRDeGxZ | source=env: GROQ_API_KEY

4. tools.media.audio config:
{
  "enabled": true,
  "echoTranscript": true,
  "scope": { "default": "allow" },
  "models": [{ "provider": "groq", "model": "whisper-large-v3" }]
}

5. OPENCLAW_DIAGNOSTICS=telegram.*,media.* — zero audio-related logs when voice sent

6. docker compose logs after voice message sent — no entries generated

7. Agent response to voice message:
"Não consigo processar áudio. Se quiser que eu responda algo, manda em texto."

Related issues: #7899, #7460

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Steps to reproduce

Deploy OpenClaw via Docker (ghcr.io/openclaw/openclaw:latest)
Configure multiple agents via agents.list (main, iris, sigma)
Enable Groq Whisper: tools.media.audio.enabled=true, models=[{provider:groq, model:whisper-large-v3}]
Set echoTranscript=true and scope.default=allow
Connect Telegram bot to main agent
Send a voice message via Telegram
Agent responds with "I can't process audio" — no transcription attempt logged

Expected behavior

Voice message should be transcribed by Groq Whisper and the transcript should be passed to the agent as text. With echoTranscript=true, a confirmation of the transcript should appear in Telegram before the agent response.

Actual behavior

Agent receives the voice message (.ogg file saved to media/inbound/) but transcription is never triggered. No audio or media-related logs appear. Agent responds asking user to send text instead. echoTranscript produces no output.

OpenClaw version

2026.4.2 (unknown)

Operating system

Ubuntu 24.04 (Hetzner CX23)

Install method

Docker

Model

ghcr.io/openclaw/openclaw:latest (version 2026.4.2)

Provider / routing chain

Anthropic / claude-haiku-4-5 → Telegram → main agent (default) Audio provider: Groq Whisper (whisper-large-v3) via GROQ_API_KEY env var

Additional provider/model setup details

3 agents configured in agents.list: main (default), iris, sigma
Groq API accessible from container (curl to api.groq.com returns 200)
GROQ_API_KEY loaded correctly (openclaw models status shows groq as available: env=gsk_...)
tools.media.audio config confirmed correct via openclaw config get tools.media.audio
.ogg files arrive in media/inbound/ confirming Telegram delivery works
OPENCLAW_DIAGNOSTICS=telegram.,media. produces zero audio-related logs when voice message is sent
applyMediaUnderstanding never appears in logs
Issue appeared after Docker migration and multi-agent configuration
Single agent setup (before adding iris and sigma) may have worked

Logs, screenshots, and evidence

Evidence collected during diagnosis:

1. .ogg files confirmed arriving in media/inbound/:
file_27---931e81c1.ogg (Apr 6 01:51)
file_26---e7982d2f.ogg (Apr 6 01:48)

2. Groq API accessible from container:
$ curl -H "Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/models → 200 OK
$ curl .ogg file to Groq transcriptions endpoint → 200 OK

3. openclaw models status shows Groq available:
- groq effective=env:gsk_KFst...MZRDeGxZ | source=env: GROQ_API_KEY

4. tools.media.audio config:
{
  "enabled": true,
  "echoTranscript": true,
  "scope": { "default": "allow" },
  "models": [{ "provider": "groq", "model": "whisper-large-v3" }]
}

5. OPENCLAW_DIAGNOSTICS=telegram.*,media.* — zero audio-related logs when voice sent

6. docker compose logs after voice message sent — no entries generated

7. Agent response to voice message:
"Não consigo processar áudio. Se quiser que eu responda algo, manda em texto."

Related issues: #7899, #7460

Impact and severity

High — core feature (voice message transcription) completely non-functional in Docker multi-agent deployment. Workaround is manual text input only.

Additional information

Deployment: Docker (ghcr.io/openclaw/openclaw:latest), Ubuntu 24.04, Hetzner CX23
Transcription worked before Docker migration (npm install -g openclaw, version 2026.3.24)
Issue persists after: docker compose down/up, gateway restart, config reset, scope.default=allow addition
OPENCLAW_DIAGNOSTICS confirms zero media pipeline activity — applyMediaUnderstanding not being called
Single agent config (before adding iris/sigma via agents.list) may not reproduce the issue
Note: audio was never tested in Docker with single agent — unclear if issue is Docker-specific or multi-agent-specific

extent analysis

TL;DR

The most likely fix for the non-functional voice message transcription in the Docker multi-agent deployment is to investigate and resolve the issue with the applyMediaUnderstanding function not being called, potentially related to the multi-agent configuration.

Guidance

Verify that the applyMediaUnderstanding function is being called in a single-agent setup to determine if the issue is specific to the multi-agent configuration.
Check the configuration of the agents.list file to ensure that the main agent is properly configured to handle voice messages and trigger the Groq Whisper transcription.
Investigate the logs with OPENCLAW_DIAGNOSTICS=telegram.*,media.* enabled to see if there are any error messages or clues that could indicate why the transcription is not being triggered.
Test the Groq API accessibility and the GROQ_API_KEY loading to ensure that the transcription service is properly configured and accessible.

Example

No code snippet is provided as the issue seems to be related to the configuration and deployment rather than a specific code error.

Notes

The issue appears to be related to the migration to Docker and the introduction of multi-agent configuration, but it is unclear if the issue is specific to the Docker environment or the multi-agent setup. Further investigation is needed to determine the root cause of the issue.

Recommendation

Apply workaround by testing the single-agent setup to see if the transcription works, and then gradually add agents to identify the point of failure. This will help to isolate the issue and determine the best course of action for resolution.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #agent setup #ISR setup #authentication setup #request error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [Bug]: Telegram voice messages not transcribed via Groq Whisper after multi-agent config (Docker) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING