openclaw - 💡(How to fix) Fix [Bug]: voice-call waits on post-turn compaction before speaking response, causing 20s+ latency

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

After fixing the voice-call routed agent tool policy issue (#79506 / PR #79508), the voice-call embedded responder can now get a valid Ollama/Qwen response quickly, but OpenClaw waits for post-turn compaction before returning the response to voice-call/TTS. This adds 20s+ latency to phone calls and can inject unrelated/hallucinated compaction summaries into the voice session.

Root Cause

  • Voice-call plugin: @openclaw/voice-call 2026.5.7
  • Voice-call mode: Twilio inbound, streaming transcription enabled, realtime disabled, conversation mode
  • STT provider: OpenAI gpt-4o-transcribe
  • TTS provider: OpenAI gpt-4o-mini-tts
  • Dedicated routed voice agent: agentId: voice
  • Voice agent model: ollama/qwen2.5:1.5b
  • Voice agent config includes tools.allow: []
  • PR #79508 tool-policy fix was applied locally before this reproduction, so this is the post-fix behavior.
  • After #79508 patch, trajectory confirms toolCount: 0 and tools: []; this latency is not caused by tool schemas anymore.

Fix Action

Fix / Workaround

  • Voice-call plugin: @openclaw/voice-call 2026.5.7
  • Voice-call mode: Twilio inbound, streaming transcription enabled, realtime disabled, conversation mode
  • STT provider: OpenAI gpt-4o-transcribe
  • TTS provider: OpenAI gpt-4o-mini-tts
  • Dedicated routed voice agent: agentId: voice
  • Voice agent model: ollama/qwen2.5:1.5b
  • Voice agent config includes tools.allow: []
  • PR #79508 tool-policy fix was applied locally before this reproduction, so this is the post-fix behavior.
  • After #79508 patch, trajectory confirms toolCount: 0 and tools: []; this latency is not caused by tool schemas anymore.

Code Example

{
     "id": "voice",
     "model": { "primary": "ollama/qwen2.5:1.5b", "fallbacks": [] },
     "thinkingDefault": "off",
     "reasoningDefault": "off",
     "fastModeDefault": true,
     "params": { "temperature": 0.2, "maxTokens": 80, "cacheRetention": "none" },
     "tools": { "allow": [] },
     "systemPromptOverride": "You are a fast phone voice assistant. Reply only as valid JSON: {\"spoken\":\"...\"}. Keep spoken under 18 words. No markdown. No tool use. Be direct, warm, and conversational."
   }

---

# Gateway log excerpt from live Twilio call after locally applying PR #79508
[voice-call] Transcript for <twilio-call-sid>: Hey, how you doing? (chars=19)
[voice-call] Auto-responding to inbound call <call-id>: "Hey, how you doing?"
2026-05-08T13:19:18.718-07:00 [agent/embedded] embedded run timeout reached during compaction; extending deadline: runId=voice:<call-id>:1778271537529 sessionId=4f0244e7-8364-4acb-b08e-04544ab37284 extraMs=900000
[voice-call] AI response: "Hello! I'm good, thanks for asking."

---

2026-05-08T20:18:58.718Z user:
  Hey, how you doing?

2026-05-08T20:19:01.355Z assistant:
  content: {"spoken":"Hello! I'm good, thanks for asking."}
  api: ollama
  provider: ollama
  model: qwen2.5:1.5b
  usage: input=62 output=15 totalTokens=77

2026-05-08T20:19:18.773Z compaction:
  summary: unrelated/hallucinated web-scraping/Python project summary

---

session.started @ 2026-05-08T20:18:58.711Z:
  provider: ollama
  modelId: qwen2.5:1.5b
  agentId: voice
  toolCount: 0

context.compiled @ 2026-05-08T20:18:58.717Z:
  prompt: Hey, how you doing?
  tools: []

model.completed @ 2026-05-08T20:19:18.778Z:
  aborted: false
  timedOut: false
  assistantTexts:
    - {"spoken":"Hello! I'm good, thanks for asking."}

---

callId: <redacted>
provider: twilio
sessionKey: voice:<redacted>
transcript:
  bot:  Hello! How can I help you today?
  user: Hey, how you doing?
  bot:  Hello! I'm good, thanks for asking.

---

compaction summary inserted into the voice session:
  "The user is trying to complete a web scraping project by extracting specific information from an HTML page using Python..."

This summary is unrelated to the phone conversation and was then included in later voice prompts as replay context.
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

After fixing the voice-call routed agent tool policy issue (#79506 / PR #79508), the voice-call embedded responder can now get a valid Ollama/Qwen response quickly, but OpenClaw waits for post-turn compaction before returning the response to voice-call/TTS. This adds 20s+ latency to phone calls and can inject unrelated/hallucinated compaction summaries into the voice session.

Steps to reproduce

  1. Run OpenClaw 2026.5.7 with @openclaw/voice-call 2026.5.7 configured for Twilio inbound calls, streaming transcription enabled, and conversation mode.
  2. Configure a routed voice agent using a local Ollama model and no tools, for example:
    {
      "id": "voice",
      "model": { "primary": "ollama/qwen2.5:1.5b", "fallbacks": [] },
      "thinkingDefault": "off",
      "reasoningDefault": "off",
      "fastModeDefault": true,
      "params": { "temperature": 0.2, "maxTokens": 80, "cacheRetention": "none" },
      "tools": { "allow": [] },
      "systemPromptOverride": "You are a fast phone voice assistant. Reply only as valid JSON: {\"spoken\":\"...\"}. Keep spoken under 18 words. No markdown. No tool use. Be direct, warm, and conversational."
    }
  3. Apply the tool-policy fix from PR #79508 locally or otherwise run a build where the voice-call embedded responder forwards tools.allow to runEmbeddedPiAgent().
  4. Make a real inbound Twilio call to the routed voice number.
  5. Say a short utterance, e.g. Hey, how you doing?.
  6. Observe the voice trajectory/session JSONL and gateway logs.

Expected behavior

The voice-call path should speak as soon as the embedded model response is available. Post-turn compaction should not block realtime voice playback, and compaction should not inject unrelated or hallucinated summaries into the voice-call session.

For realtime voice, if compaction is needed, it should either:

  • run asynchronously after the response is handed to TTS,
  • be disabled/bypassed for short-lived voice-call embedded runs,
  • or use a voice-call-specific policy that does not add phone-call latency.

Actual behavior

The Ollama/Qwen model returns a valid spoken JSON response quickly, but the embedded run waits on compaction before voice-call logs/speaks the AI response. The caller experiences roughly 20s+ latency even though the model response itself was available in about 2.6s.

The compaction result also appears unrelated to the actual voice conversation. It inserted summaries about a web-scraping/Python project into a phone-call session that only contained short voice turns. Subsequent voice prompts replayed that bogus compaction summary as context, and the model produced odd identity answers such as claiming to be created by Anthropic despite the configured model being Ollama/Qwen.

OpenClaw version

2026.5.7 (eeef486)

Operating system

Linux 6.8.0-111-generic (x64)

Install method

npm global / OpenClaw Gateway running as systemd user service

Model

ollama/qwen2.5:1.5b for the voice response model

Provider / routing chain

Twilio Programmable Voice -> Tailscale Funnel -> OpenClaw voice-call webhook -> OpenAI streaming transcription -> OpenClaw embedded voice agent response -> local Ollama qwen2.5:1.5b -> OpenAI TTS

Additional provider/model setup details

  • Voice-call plugin: @openclaw/voice-call 2026.5.7
  • Voice-call mode: Twilio inbound, streaming transcription enabled, realtime disabled, conversation mode
  • STT provider: OpenAI gpt-4o-transcribe
  • TTS provider: OpenAI gpt-4o-mini-tts
  • Dedicated routed voice agent: agentId: voice
  • Voice agent model: ollama/qwen2.5:1.5b
  • Voice agent config includes tools.allow: []
  • PR #79508 tool-policy fix was applied locally before this reproduction, so this is the post-fix behavior.
  • After #79508 patch, trajectory confirms toolCount: 0 and tools: []; this latency is not caused by tool schemas anymore.

Logs, screenshots, and evidence

# Gateway log excerpt from live Twilio call after locally applying PR #79508
[voice-call] Transcript for <twilio-call-sid>: Hey, how you doing? (chars=19)
[voice-call] Auto-responding to inbound call <call-id>: "Hey, how you doing?"
2026-05-08T13:19:18.718-07:00 [agent/embedded] embedded run timeout reached during compaction; extending deadline: runId=voice:<call-id>:1778271537529 sessionId=4f0244e7-8364-4acb-b08e-04544ab37284 extraMs=900000
[voice-call] AI response: "Hello! I'm good, thanks for asking."

The persisted voice session JSONL shows the model response itself was written quickly:

2026-05-08T20:18:58.718Z user:
  Hey, how you doing?

2026-05-08T20:19:01.355Z assistant:
  content: {"spoken":"Hello! I'm good, thanks for asking."}
  api: ollama
  provider: ollama
  model: qwen2.5:1.5b
  usage: input=62 output=15 totalTokens=77

2026-05-08T20:19:18.773Z compaction:
  summary: unrelated/hallucinated web-scraping/Python project summary

Trajectory evidence from the same after-fix run:

session.started @ 2026-05-08T20:18:58.711Z:
  provider: ollama
  modelId: qwen2.5:1.5b
  agentId: voice
  toolCount: 0

context.compiled @ 2026-05-08T20:18:58.717Z:
  prompt: Hey, how you doing?
  tools: []

model.completed @ 2026-05-08T20:19:18.778Z:
  aborted: false
  timedOut: false
  assistantTexts:
    - {"spoken":"Hello! I'm good, thanks for asking."}

The call transcript confirms the voice call eventually spoke the answer, but only after the compaction delay:

callId: <redacted>
provider: twilio
sessionKey: voice:<redacted>
transcript:
  bot:  Hello! How can I help you today?
  user: Hey, how you doing?
  bot:  Hello! I'm good, thanks for asking.

Additional problematic context from the same session:

compaction summary inserted into the voice session:
  "The user is trying to complete a web scraping project by extracting specific information from an HTML page using Python..."

This summary is unrelated to the phone conversation and was then included in later voice prompts as replay context.

Impact and severity

Affected: Users running voice-call streaming/conversation mode with embedded agent responses, especially local/Ollama voice models and short response timeouts.

Severity: High for voice-call usability. The model can answer quickly, but the phone caller hears long silence because response delivery is blocked by compaction.

Frequency: Observed repeatedly in a live Twilio inbound call after applying PR #79508 locally. Each voice turn logged embedded run timeout reached during compaction; extending deadline before the AI response was spoken/logged.

Consequence: Realtime voice feels broken or unresponsive. The caller waits 20s+ for a response that was generated in ~2.6s. The bogus compaction summary also contaminates subsequent turns.

Additional information

This was discovered immediately after validating the fix for #79506 / PR #79508. That fix appears to work: after applying it locally, voice-call used ollama/qwen2.5:1.5b, respected agentId: voice, and compiled toolCount: 0 / tools: [].

This issue is separate: once the model response is available, the embedded runner appears to wait for compaction before returning control to the voice-call response generator. For voice-call, response playback should probably happen before post-turn compaction completes, or compaction should be disabled/bypassed for this low-latency lane.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The voice-call path should speak as soon as the embedded model response is available. Post-turn compaction should not block realtime voice playback, and compaction should not inject unrelated or hallucinated summaries into the voice-call session.

For realtime voice, if compaction is needed, it should either:

  • run asynchronously after the response is handed to TTS,
  • be disabled/bypassed for short-lived voice-call embedded runs,
  • or use a voice-call-specific policy that does not add phone-call latency.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: voice-call waits on post-turn compaction before speaking response, causing 20s+ latency