openclaw - 💡(How to fix) Fix [Bug]: voice-call waits on post-turn compaction before speaking response, causing 20s+ latency

StepCodex · 2026-05-08T20:43:00Z

[openclaw] After fixing the voice-call routed agent tool policy issue 79506 / PR 79508 , the voice-call embedded responder can now get a valid Ollama/Qwen resp… After fixing the voice-call routed agent tool policy issue (#79506 / PR #79508), the voice-call embedded responder can now get a valid Ollama/Qwen response quickly, but OpenClaw waits for post-turn compaction before returning the response to voice-call/TTS. This adds 20s+ latency to phone calls and can inject unrelated/hallucinated compaction summaries into the voice session. ## Fix / Workaround - Voice-call plugin: `@openclaw/voice-call` 2026.5.7 - Voice-call mode: Twilio inbound, streaming transcription enabled, realtime disabled, conversation mode - STT provider: OpenAI `gpt-4o-transcribe` - TTS provider: OpenAI `gpt-4o-mini-tts` - Dedicated routed voice agent: `agentId: voice` - Voice agent model: `ollama/qwen2.5:1.5b` - Voice agent config includes `tools.allow: []` - PR #79508 tool-policy fix was applied locally before this reproduction, so this is the post-fix behavior. - After #79508 patch, trajectory confirms `toolCount: 0` and `tools: []`; this latency is not caused by tool schemas anymore. ### Bug type Behavior bug (incorrect output/state without crash) ### Beta release blocker No ### Summary After fixing the voice-call routed agent tool policy issue (#79506 / PR #79508), the voice-call embedded responder can now get a valid Ollama/Qwen response quickly, but OpenClaw waits for post-turn compaction before returning the response to voice-call/TTS. This adds 20s+ latency to phone calls and can inject unrelated/hallucinated compaction summaries into the voice session. ### Steps to reproduce 1. Run OpenClaw 2026.5.7 with `@openclaw/voice-call` 2026.5.7 configured for Twilio inbound calls, streaming transcription enabled, and conversation mode. 2. Configure a routed voice agent using a local Ollama model and no tools, for example: ```json5 { "id": "voice", "model": { "primary": "ollama/qwen2.5:1.5b", "fallbacks": [] }, "thinkingDefault": "off", "reasoningDefault": "off", "fastModeDefault": true, "params": { "temperature": 0.2, "maxTokens": 80, "cacheRetention": "none" }, "tools": { "allow": [] }, "systemPromptOverride": "You are a fast phone voice assistant. Reply only as valid JSON: {\"spoken\":\"...\"}. Keep spoken under 18 words. No markdown. No tool use. Be direct, warm, and conversational." } ``` 3. Apply the tool-policy fix from PR #79508 locally or otherwise run a build where the voice-call embedded responder forwards `tools.allow` to `runEmbeddedPiAgent()`. 4. Make a real inbound Twilio call to the routed voice number. 5. Say a short utterance, e.g. `Hey, how you doing?`. 6. Observe the voice trajectory/session JSONL and gateway logs. ### Expected behavior The voice-call path should speak as soon as the embedded model response is available. Post-turn compaction should not block realtime voice playback, and compaction should not inject unrelated or hallucinated summaries into the voice-call session. For realtime voice, if compaction is needed, it should either: - run asynchronously after the response is handed to TTS, - be disabled/bypassed for short-lived voice-call embedded runs, - or use a voice-call-specific policy that does not add phone-call latency. ### Actual behavior The Ollama/Qwen model returns a valid spoken JSON response quickly, but the embedded run waits on compaction before voice-call logs/speaks the AI response. The caller experiences roughly 20s+ latency even though the model response itself was available in about 2.6s. The compaction result also appears unrelated to the actual voice conversation. It inserted summaries about a web-scraping/Python project into a phone-call session that only contained short voice turns. Subsequent voice prompts replayed that bogus compaction summary as context, and the model produced odd identity answers such as claiming to be created by Anthropic despite the configured model being Ollama/Qwen. ### OpenClaw version 2026.5.7 (eeef486) ### Operating system Linux 6.8.0-111-generic (x64) ### Install method npm global / OpenClaw Gateway running as systemd user service ### Model ollama/qwen2.5:1.5b for the voice response model ### Provider / routing chain Twilio Programmable Voice -> Tailscale Funnel -> OpenClaw voice-call webhook -> OpenAI streaming transcription -> OpenClaw embedded voice agent response -> local Ollama qwen2.5:1.5b -> OpenAI TTS ### Additional provider/model setup details - Voice-call plugin: `@openclaw/voice-call` 2026.5.7 - Voice-call mode: Twilio inbound, streaming transcription enabled, realtime disabled, conversation mode - STT provider: OpenAI `gpt-4o-transcribe` - TTS provider: OpenAI `gpt-4o-mini-tts` - Dedicated routed voice agent: `agentId: voice` - Voice agent model: `ollama/qwen2.5:1.5b` - Voice agent config includes `tools.allow: []` - PR #79508 tool-policy fix was applied locally before this reproduction, so this is the post-fix behavior. - After #79508 patch, tr

openclaw2026-05-08 20:43:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

After fixing the voice-call routed agent tool policy issue (#79506 / PR #79508), the voice-call embedded responder can now get a valid Ollama/Qwen response quickly, but OpenClaw waits for post-turn compaction before returning the response to voice-call/TTS. This adds 20s+ latency to phone calls and can inject unrelated/hallucinated compaction summaries into the voice session.

Root Cause

Voice-call plugin: @openclaw/voice-call 2026.5.7
Voice-call mode: Twilio inbound, streaming transcription enabled, realtime disabled, conversation mode
STT provider: OpenAI gpt-4o-transcribe
TTS provider: OpenAI gpt-4o-mini-tts
Dedicated routed voice agent: agentId: voice
Voice agent model: ollama/qwen2.5:1.5b
Voice agent config includes tools.allow: []
PR #79508 tool-policy fix was applied locally before this reproduction, so this is the post-fix behavior.
After #79508 patch, trajectory confirms toolCount: 0 and tools: []; this latency is not caused by tool schemas anymore.

Fix Action

Fix / Workaround

Voice-call plugin: @openclaw/voice-call 2026.5.7
Voice-call mode: Twilio inbound, streaming transcription enabled, realtime disabled, conversation mode
STT provider: OpenAI gpt-4o-transcribe
TTS provider: OpenAI gpt-4o-mini-tts
Dedicated routed voice agent: agentId: voice
Voice agent model: ollama/qwen2.5:1.5b
Voice agent config includes tools.allow: []
PR #79508 tool-policy fix was applied locally before this reproduction, so this is the post-fix behavior.
After #79508 patch, trajectory confirms toolCount: 0 and tools: []; this latency is not caused by tool schemas anymore.

Code Example

{
     "id": "voice",
     "model": { "primary": "ollama/qwen2.5:1.5b", "fallbacks": [] },
     "thinkingDefault": "off",
     "reasoningDefault": "off",
     "fastModeDefault": true,
     "params": { "temperature": 0.2, "maxTokens": 80, "cacheRetention": "none" },
     "tools": { "allow": [] },
     "systemPromptOverride": "You are a fast phone voice assistant. Reply only as valid JSON: {\"spoken\":\"...\"}. Keep spoken under 18 words. No markdown. No tool use. Be direct, warm, and conversational."
   }

---

# Gateway log excerpt from live Twilio call after locally applying PR #79508
[voice-call] Transcript for <twilio-call-sid>: Hey, how you doing? (chars=19)
[voice-call] Auto-responding to inbound call <call-id>: "Hey, how you doing?"
2026-05-08T13:19:18.718-07:00 [agent/embedded] embedded run timeout reached during compaction; extending deadline: runId=voice:<call-id>:1778271537529 sessionId=4f0244e7-8364-4acb-b08e-04544ab37284 extraMs=900000
[voice-call] AI response: "Hello! I'm good, thanks for asking."

---

2026-05-08T20:18:58.718Z user:
  Hey, how you doing?

2026-05-08T20:19:01.355Z assistant:
  content: {"spoken":"Hello! I'm good, thanks for asking."}
  api: ollama
  provider: ollama
  model: qwen2.5:1.5b
  usage: input=62 output=15 totalTokens=77

2026-05-08T20:19:18.773Z compaction:
  summary: unrelated/hallucinated web-scraping/Python project summary

---

session.started @ 2026-05-08T20:18:58.711Z:
  provider: ollama
  modelId: qwen2.5:1.5b
  agentId: voice
  toolCount: 0

context.compiled @ 2026-05-08T20:18:58.717Z:
  prompt: Hey, how you doing?
  tools: []

model.completed @ 2026-05-08T20:19:18.778Z:
  aborted: false
  timedOut: false
  assistantTexts:
    - {"spoken":"Hello! I'm good, thanks for asking."}

---

callId: <redacted>
provider: twilio
sessionKey: voice:<redacted>
transcript:
  bot:  Hello! How can I help you today?
  user: Hey, how you doing?
  bot:  Hello! I'm good, thanks for asking.

---

compaction summary inserted into the voice session:
  "The user is trying to complete a web scraping project by extracting specific information from an HTML page using Python..."

This summary is unrelated to the phone conversation and was then included in later voice prompts as replay context.

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Steps to reproduce

Run OpenClaw 2026.5.7 with @openclaw/voice-call 2026.5.7 configured for Twilio inbound calls, streaming transcription enabled, and conversation mode.

Configure a routed voice agent using a local Ollama model and no tools, for example:

{
  "id": "voice",
  "model": { "primary": "ollama/qwen2.5:1.5b", "fallbacks": [] },
  "thinkingDefault": "off",
  "reasoningDefault": "off",
  "fastModeDefault": true,
  "params": { "temperature": 0.2, "maxTokens": 80, "cacheRetention": "none" },
  "tools": { "allow": [] },
  "systemPromptOverride": "You are a fast phone voice assistant. Reply only as valid JSON: {\"spoken\":\"...\"}. Keep spoken under 18 words. No markdown. No tool use. Be direct, warm, and conversational."
}

Apply the tool-policy fix from PR #79508 locally or otherwise run a build where the voice-call embedded responder forwards tools.allow to runEmbeddedPiAgent().
Make a real inbound Twilio call to the routed voice number.
Say a short utterance, e.g. Hey, how you doing?.
Observe the voice trajectory/session JSONL and gateway logs.

Expected behavior

The voice-call path should speak as soon as the embedded model response is available. Post-turn compaction should not block realtime voice playback, and compaction should not inject unrelated or hallucinated summaries into the voice-call session.

For realtime voice, if compaction is needed, it should either:

run asynchronously after the response is handed to TTS,
be disabled/bypassed for short-lived voice-call embedded runs,
or use a voice-call-specific policy that does not add phone-call latency.

Actual behavior

The Ollama/Qwen model returns a valid spoken JSON response quickly, but the embedded run waits on compaction before voice-call logs/speaks the AI response. The caller experiences roughly 20s+ latency even though the model response itself was available in about 2.6s.

The compaction result also appears unrelated to the actual voice conversation. It inserted summaries about a web-scraping/Python project into a phone-call session that only contained short voice turns. Subsequent voice prompts replayed that bogus compaction summary as context, and the model produced odd identity answers such as claiming to be created by Anthropic despite the configured model being Ollama/Qwen.

OpenClaw version

2026.5.7 (eeef486)

Operating system

Linux 6.8.0-111-generic (x64)

Install method

npm global / OpenClaw Gateway running as systemd user service

Model

ollama/qwen2.5:1.5b for the voice response model

Provider / routing chain

Twilio Programmable Voice -> Tailscale Funnel -> OpenClaw voice-call webhook -> OpenAI streaming transcription -> OpenClaw embedded voice agent response -> local Ollama qwen2.5:1.5b -> OpenAI TTS

Additional provider/model setup details

Voice-call plugin: @openclaw/voice-call 2026.5.7
Voice-call mode: Twilio inbound, streaming transcription enabled, realtime disabled, conversation mode
STT provider: OpenAI gpt-4o-transcribe
TTS provider: OpenAI gpt-4o-mini-tts
Dedicated routed voice agent: agentId: voice
Voice agent model: ollama/qwen2.5:1.5b
Voice agent config includes tools.allow: []
PR #79508 tool-policy fix was applied locally before this reproduction, so this is the post-fix behavior.
After #79508 patch, trajectory confirms toolCount: 0 and tools: []; this latency is not caused by tool schemas anymore.

Logs, screenshots, and evidence

# Gateway log excerpt from live Twilio call after locally applying PR #79508
[voice-call] Transcript for <twilio-call-sid>: Hey, how you doing? (chars=19)
[voice-call] Auto-responding to inbound call <call-id>: "Hey, how you doing?"
2026-05-08T13:19:18.718-07:00 [agent/embedded] embedded run timeout reached during compaction; extending deadline: runId=voice:<call-id>:1778271537529 sessionId=4f0244e7-8364-4acb-b08e-04544ab37284 extraMs=900000
[voice-call] AI response: "Hello! I'm good, thanks for asking."

The persisted voice session JSONL shows the model response itself was written quickly:

2026-05-08T20:18:58.718Z user:
  Hey, how you doing?

2026-05-08T20:19:01.355Z assistant:
  content: {"spoken":"Hello! I'm good, thanks for asking."}
  api: ollama
  provider: ollama
  model: qwen2.5:1.5b
  usage: input=62 output=15 totalTokens=77

2026-05-08T20:19:18.773Z compaction:
  summary: unrelated/hallucinated web-scraping/Python project summary

Trajectory evidence from the same after-fix run:

session.started @ 2026-05-08T20:18:58.711Z:
  provider: ollama
  modelId: qwen2.5:1.5b
  agentId: voice
  toolCount: 0

context.compiled @ 2026-05-08T20:18:58.717Z:
  prompt: Hey, how you doing?
  tools: []

model.completed @ 2026-05-08T20:19:18.778Z:
  aborted: false
  timedOut: false
  assistantTexts:
    - {"spoken":"Hello! I'm good, thanks for asking."}

The call transcript confirms the voice call eventually spoke the answer, but only after the compaction delay:

callId: <redacted>
provider: twilio
sessionKey: voice:<redacted>
transcript:
  bot:  Hello! How can I help you today?
  user: Hey, how you doing?
  bot:  Hello! I'm good, thanks for asking.

Additional problematic context from the same session:

compaction summary inserted into the voice session:
  "The user is trying to complete a web scraping project by extracting specific information from an HTML page using Python..."

This summary is unrelated to the phone conversation and was then included in later voice prompts as replay context.

Impact and severity

Affected: Users running voice-call streaming/conversation mode with embedded agent responses, especially local/Ollama voice models and short response timeouts.

Severity: High for voice-call usability. The model can answer quickly, but the phone caller hears long silence because response delivery is blocked by compaction.

Frequency: Observed repeatedly in a live Twilio inbound call after applying PR #79508 locally. Each voice turn logged embedded run timeout reached during compaction; extending deadline before the AI response was spoken/logged.

Consequence: Realtime voice feels broken or unresponsive. The caller waits 20s+ for a response that was generated in ~2.6s. The bogus compaction summary also contaminates subsequent turns.

Additional information

This was discovered immediately after validating the fix for #79506 / PR #79508. That fix appears to work: after applying it locally, voice-call used ollama/qwen2.5:1.5b, respected agentId: voice, and compiled toolCount: 0 / tools: [].

This issue is separate: once the model response is available, the embedded runner appears to wait for compaction before returning control to the voice-call response generator. For voice-call, response playback should probably happen before post-turn compaction completes, or compaction should be disabled/bypassed for this low-latency lane.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

For realtime voice, if compaction is needed, it should either:

run asynchronously after the response is handed to TTS,
be disabled/bypassed for short-lived voice-call embedded runs,
or use a voice-call-specific policy that does not add phone-call latency.

#api #serialization error #model compatibility #GPU setup #container setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: voice-call waits on post-turn compaction before speaking response, causing 20s+ latency

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: voice-call waits on post-turn compaction before speaking response, causing 20s+ latency

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING