openclaw - 💡(How to fix) Fix [Bug]: Telegram voice messages not transcribed via Groq Whisper after multi-agent config (Docker) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62209Fetched 2026-04-08 03:07:38
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
labeled ×2

Telegram voice messages (.ogg) are received and saved to media/inbound/ correctly, but Groq Whisper transcription is never triggered. Agent responds asking user to send text instead. No audio-related logs appear even with OPENCLAW_DIAGNOSTICS=telegram.,media. enabled.

Root Cause

Telegram voice messages (.ogg) are received and saved to media/inbound/ correctly, but Groq Whisper transcription is never triggered. Agent responds asking user to send text instead. No audio-related logs appear even with OPENCLAW_DIAGNOSTICS=telegram.,media. enabled.

Fix Action

Fix / Workaround

High — core feature (voice message transcription) completely non-functional in Docker multi-agent deployment. Workaround is manual text input only.

Code Example

Evidence collected during diagnosis:

1. .ogg files confirmed arriving in media/inbound/:
file_27---931e81c1.ogg (Apr 6 01:51)
file_26---e7982d2f.ogg (Apr 6 01:48)

2. Groq API accessible from container:
$ curl -H "Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/models → 200 OK
$ curl .ogg file to Groq transcriptions endpoint → 200 OK

3. openclaw models status shows Groq available:
- groq effective=env:gsk_KFst...MZRDeGxZ | source=env: GROQ_API_KEY

4. tools.media.audio config:
{
  "enabled": true,
  "echoTranscript": true,
  "scope": { "default": "allow" },
  "models": [{ "provider": "groq", "model": "whisper-large-v3" }]
}

5. OPENCLAW_DIAGNOSTICS=telegram.*,media.* — zero audio-related logs when voice sent

6. docker compose logs after voice message sent — no entries generated

7. Agent response to voice message:
"Não consigo processar áudio. Se quiser que eu responda algo, manda em texto."

Related issues: #7899, #7460
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Telegram voice messages (.ogg) are received and saved to media/inbound/ correctly, but Groq Whisper transcription is never triggered. Agent responds asking user to send text instead. No audio-related logs appear even with OPENCLAW_DIAGNOSTICS=telegram.,media. enabled.

Steps to reproduce

  1. Deploy OpenClaw via Docker (ghcr.io/openclaw/openclaw:latest)
  2. Configure multiple agents via agents.list (main, iris, sigma)
  3. Enable Groq Whisper: tools.media.audio.enabled=true, models=[{provider:groq, model:whisper-large-v3}]
  4. Set echoTranscript=true and scope.default=allow
  5. Connect Telegram bot to main agent
  6. Send a voice message via Telegram
  7. Agent responds with "I can't process audio" — no transcription attempt logged

Expected behavior

Voice message should be transcribed by Groq Whisper and the transcript should be passed to the agent as text. With echoTranscript=true, a confirmation of the transcript should appear in Telegram before the agent response.

Actual behavior

Agent receives the voice message (.ogg file saved to media/inbound/) but transcription is never triggered. No audio or media-related logs appear. Agent responds asking user to send text instead. echoTranscript produces no output.

OpenClaw version

2026.4.2 (unknown)

Operating system

Ubuntu 24.04 (Hetzner CX23)

Install method

Docker

Model

ghcr.io/openclaw/openclaw:latest (version 2026.4.2)

Provider / routing chain

Anthropic / claude-haiku-4-5 → Telegram → main agent (default) Audio provider: Groq Whisper (whisper-large-v3) via GROQ_API_KEY env var

Additional provider/model setup details

  • 3 agents configured in agents.list: main (default), iris, sigma
  • Groq API accessible from container (curl to api.groq.com returns 200)
  • GROQ_API_KEY loaded correctly (openclaw models status shows groq as available: env=gsk_...)
  • tools.media.audio config confirmed correct via openclaw config get tools.media.audio
  • .ogg files arrive in media/inbound/ confirming Telegram delivery works
  • OPENCLAW_DIAGNOSTICS=telegram.,media. produces zero audio-related logs when voice message is sent
  • applyMediaUnderstanding never appears in logs
  • Issue appeared after Docker migration and multi-agent configuration
  • Single agent setup (before adding iris and sigma) may have worked

Logs, screenshots, and evidence

Evidence collected during diagnosis:

1. .ogg files confirmed arriving in media/inbound/:
file_27---931e81c1.ogg (Apr 6 01:51)
file_26---e7982d2f.ogg (Apr 6 01:48)

2. Groq API accessible from container:
$ curl -H "Authorization: Bearer $GROQ_API_KEY" https://api.groq.com/openai/v1/models → 200 OK
$ curl .ogg file to Groq transcriptions endpoint → 200 OK

3. openclaw models status shows Groq available:
- groq effective=env:gsk_KFst...MZRDeGxZ | source=env: GROQ_API_KEY

4. tools.media.audio config:
{
  "enabled": true,
  "echoTranscript": true,
  "scope": { "default": "allow" },
  "models": [{ "provider": "groq", "model": "whisper-large-v3" }]
}

5. OPENCLAW_DIAGNOSTICS=telegram.*,media.* — zero audio-related logs when voice sent

6. docker compose logs after voice message sent — no entries generated

7. Agent response to voice message:
"Não consigo processar áudio. Se quiser que eu responda algo, manda em texto."

Related issues: #7899, #7460

Impact and severity

High — core feature (voice message transcription) completely non-functional in Docker multi-agent deployment. Workaround is manual text input only.

Additional information

  • Deployment: Docker (ghcr.io/openclaw/openclaw:latest), Ubuntu 24.04, Hetzner CX23
  • Transcription worked before Docker migration (npm install -g openclaw, version 2026.3.24)
  • Issue persists after: docker compose down/up, gateway restart, config reset, scope.default=allow addition
  • OPENCLAW_DIAGNOSTICS confirms zero media pipeline activity — applyMediaUnderstanding not being called
  • Single agent config (before adding iris/sigma via agents.list) may not reproduce the issue
  • Note: audio was never tested in Docker with single agent — unclear if issue is Docker-specific or multi-agent-specific

extent analysis

TL;DR

The most likely fix for the non-functional voice message transcription in the Docker multi-agent deployment is to investigate and resolve the issue with the applyMediaUnderstanding function not being called, potentially related to the multi-agent configuration.

Guidance

  • Verify that the applyMediaUnderstanding function is being called in a single-agent setup to determine if the issue is specific to the multi-agent configuration.
  • Check the configuration of the agents.list file to ensure that the main agent is properly configured to handle voice messages and trigger the Groq Whisper transcription.
  • Investigate the logs with OPENCLAW_DIAGNOSTICS=telegram.*,media.* enabled to see if there are any error messages or clues that could indicate why the transcription is not being triggered.
  • Test the Groq API accessibility and the GROQ_API_KEY loading to ensure that the transcription service is properly configured and accessible.

Example

No code snippet is provided as the issue seems to be related to the configuration and deployment rather than a specific code error.

Notes

The issue appears to be related to the migration to Docker and the introduction of multi-agent configuration, but it is unclear if the issue is specific to the Docker environment or the multi-agent setup. Further investigation is needed to determine the root cause of the issue.

Recommendation

Apply workaround by testing the single-agent setup to see if the transcription works, and then gradually add agents to identify the point of failure. This will help to isolate the issue and determine the best course of action for resolution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Voice message should be transcribed by Groq Whisper and the transcript should be passed to the agent as text. With echoTranscript=true, a confirmation of the transcript should appear in Telegram before the agent response.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING