openclaw - 💡(How to fix) Fix Stuck session not auto-killed — API call hung for 49 minutes blocking Telegram session [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68620Fetched 2026-04-19 15:09:27
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

When an LLM API call hangs (no response, no error), OpenClaw's diagnostic heartbeat correctly detects the stuck session every 30 seconds but never kills it. In our case, a gpt-5.4 API call hung for ~49 minutes (2,928,499 ms), completely blocking the Telegram session for that entire duration. All subsequent messages queued behind the stuck run.

Error Message

When an LLM API call hangs (no response, no error), OpenClaw's diagnostic heartbeat correctly detects the stuck session every 30 seconds but never kills it. In our case, a gpt-5.4 API call hung for ~49 minutes (2,928,499 ms), completely blocking the Telegram session for that entire duration. All subsequent messages queued behind the stuck run. 3. Triggers failover to the next model in the fallback chain (if available) or surfaces an error to the user.

Root Cause

When an LLM API call hangs (no response, no error), OpenClaw's diagnostic heartbeat correctly detects the stuck session every 30 seconds but never kills it. In our case, a gpt-5.4 API call hung for ~49 minutes (2,928,499 ms), completely blocking the Telegram session for that entire duration. All subsequent messages queued behind the stuck run.

RAW_BUFFERClick to expand / collapse

Summary

When an LLM API call hangs (no response, no error), OpenClaw's diagnostic heartbeat correctly detects the stuck session every 30 seconds but never kills it. In our case, a gpt-5.4 API call hung for ~49 minutes (2,928,499 ms), completely blocking the Telegram session for that entire duration. All subsequent messages queued behind the stuck run.

Timeline

TimeEvent
22:25:50Message received, session → processing
22:26:32Run started on openai-codex/gpt-5.4
22:28–23:15Diagnostic heartbeat logs stuck session every 30s with growing age (126s → 1716s+). No auto-kill.
23:15:21Run finally aborted (provider-side timeout?). durationMs=2,928,499 (~49 min). aborted=true.
23:15:56Next message retried on fallback glm-5.1, also hung for 6 min before abort.
23:22:46Third attempt processed normally (13s).

Expected behavior

OpenClaw should auto-kill stuck runs after a configurable threshold (e.g., 2–5 minutes) and proceed with either a retry or fallback, rather than letting a single API call block the session indefinitely.

Current behavior

  • diagnostic heartbeat detects and logs stuck session but takes no corrective action.
  • The only recovery path is waiting for the provider's HTTP timeout (which can be extremely long).
  • All messages for the affected session queue behind the stuck run.

Suggested fix

Add a configurable sessionMaxRunTimeMs (or similar) that:

  1. Tracks elapsed time since run_started.
  2. If exceeded, forcibly aborts the run.
  3. Triggers failover to the next model in the fallback chain (if available) or surfaces an error to the user.
  4. Default: 300s (5 minutes). Allow override per agent or per model.

Environment

  • OpenClaw: 2026.4.15 (stable, pnpm)
  • Gateway: Linux x64, Ubuntu 24.04
  • Model: openai-codex/gpt-5.4 (default), fallback zai-coding/glm-5.1
  • Channel: Telegram (polling mode)

extent analysis

TL;DR

Implement a configurable sessionMaxRunTimeMs to auto-kill stuck API calls and trigger failover or error handling.

Guidance

  • Introduce a sessionMaxRunTimeMs setting to track elapsed time since run_started and abort the run if exceeded.
  • Set a default value (e.g., 300s) and allow overrides per agent or model for flexibility.
  • Upon exceeding the threshold, forcibly abort the run and trigger failover to the next model in the fallback chain or surface an error to the user.
  • Review the diagnostic heartbeat logs to ensure correct detection of stuck sessions and verify the new setting's effectiveness.

Example

No code snippet is provided as the issue does not contain specific code references, but the suggested fix implies modifying the OpenClaw configuration or implementation to include the sessionMaxRunTimeMs setting.

Notes

The suggested fix assumes that the OpenClaw implementation allows for such a configuration change. It is essential to review the OpenClaw documentation and codebase to ensure that this change is feasible and compatible with the existing architecture.

Recommendation

Apply the suggested workaround by implementing the sessionMaxRunTimeMs setting to prevent indefinite blocking of sessions due to stuck API calls, as this addresses the root cause of the issue and provides a configurable solution for handling such scenarios.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

OpenClaw should auto-kill stuck runs after a configurable threshold (e.g., 2–5 minutes) and proceed with either a retry or fallback, rather than letting a single API call block the session indefinitely.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Stuck session not auto-killed — API call hung for 49 minutes blocking Telegram session [1 participants]