openclaw - 💡(How to fix) Fix Stuck session not auto-killed — API call hung for 49 minutes blocking Telegram session [1 participants]

kitfuchs · 2026-04-18T15:45:36Z

[openclaw] When an LLM API call hangs no response, no error , OpenClaw's diagnostic heartbeat correctly detects the stuck session every 30 seconds but never ki… When an LLM API call hangs (no response, no error), OpenClaw's diagnostic heartbeat correctly detects the stuck session every 30 seconds but never kills it. In our case, a `gpt-5.4` API call hung for **~49 minutes** (2,928,499 ms), completely blocking the Telegram session for that entire duration. All subsequent messages queued behind the stuck run. ## Summary When an LLM API call hangs (no response, no error), OpenClaw's diagnostic heartbeat correctly detects the stuck session every 30 seconds but never kills it. In our case, a `gpt-5.4` API call hung for **~49 minutes** (2,928,499 ms), completely blocking the Telegram session for that entire duration. All subsequent messages queued behind the stuck run. ## Timeline | Time | Event | |---|---| | 22:25:50 | Message received, session → `processing` | | 22:26:32 | Run started on `openai-codex/gpt-5.4` | | 22:28–23:15 | Diagnostic heartbeat logs `stuck session` every 30s with growing `age` (126s → 1716s+). **No auto-kill.** | | 23:15:21 | Run finally aborted (provider-side timeout?). `durationMs=2,928,499` (~49 min). `aborted=true`. | | 23:15:56 | Next message retried on fallback `glm-5.1`, also hung for 6 min before abort. | | 23:22:46 | Third attempt processed normally (13s). | ## Expected behavior OpenClaw should auto-kill stuck runs after a configurable threshold (e.g., 2–5 minutes) and proceed with either a retry or fallback, rather than letting a single API call block the session indefinitely. ## Current behavior - `diagnostic` heartbeat detects and logs `stuck session` but takes no corrective action. - The only recovery path is waiting for the provider's HTTP timeout (which can be extremely long). - All messages for the affected session queue behind the stuck run. ## Suggested fix Add a configurable `sessionMaxRunTimeMs` (or similar) that: 1. Tracks elapsed time since `run_started`. 2. If exceeded, forcibly aborts the run. 3. Triggers failover to the next model in the fallback chain (if available) or surfaces an error to the user. 4. Default: 300s (5 minutes). Allow override per agent or per model. ## Environment - OpenClaw: 2026.4.15 (stable, pnpm) - Gateway: Linux x64, Ubuntu 24.04 - Model: `openai-codex/gpt-5.4` (default), fallback `zai-coding/glm-5.1` - Channel: Telegram (polling mode)

openclaw2026-04-18 15:45:36

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#68620•Fetched 2026-04-19 15:09:27

View on GitHub

Comments

Participants

Timeline

Reactions

Author

kitfuchs

Participants

kitfuchs

When an LLM API call hangs (no response, no error), OpenClaw's diagnostic heartbeat correctly detects the stuck session every 30 seconds but never kills it. In our case, a gpt-5.4 API call hung for ~49 minutes (2,928,499 ms), completely blocking the Telegram session for that entire duration. All subsequent messages queued behind the stuck run.

Error Message

Root Cause

RAW_BUFFERClick to expand / collapse

Summary

Timeline

Time	Event
22:25:50	Message received, session → `processing`
22:26:32	Run started on `openai-codex/gpt-5.4`
22:28–23:15	Diagnostic heartbeat logs `stuck session` every 30s with growing `age` (126s → 1716s+). No auto-kill.
23:15:21	Run finally aborted (provider-side timeout?). `durationMs=2,928,499` (~49 min). `aborted=true`.
23:15:56	Next message retried on fallback `glm-5.1`, also hung for 6 min before abort.
23:22:46	Third attempt processed normally (13s).

Expected behavior

OpenClaw should auto-kill stuck runs after a configurable threshold (e.g., 2–5 minutes) and proceed with either a retry or fallback, rather than letting a single API call block the session indefinitely.

Current behavior

diagnostic heartbeat detects and logs stuck session but takes no corrective action.
The only recovery path is waiting for the provider's HTTP timeout (which can be extremely long).
All messages for the affected session queue behind the stuck run.

Suggested fix

Add a configurable sessionMaxRunTimeMs (or similar) that:

Tracks elapsed time since run_started.
If exceeded, forcibly aborts the run.
Triggers failover to the next model in the fallback chain (if available) or surfaces an error to the user.
Default: 300s (5 minutes). Allow override per agent or per model.

Environment

OpenClaw: 2026.4.15 (stable, pnpm)
Gateway: Linux x64, Ubuntu 24.04
Model: openai-codex/gpt-5.4 (default), fallback zai-coding/glm-5.1
Channel: Telegram (polling mode)

extent analysis

TL;DR

Implement a configurable sessionMaxRunTimeMs to auto-kill stuck API calls and trigger failover or error handling.

Guidance

Introduce a sessionMaxRunTimeMs setting to track elapsed time since run_started and abort the run if exceeded.
Set a default value (e.g., 300s) and allow overrides per agent or model for flexibility.
Upon exceeding the threshold, forcibly abort the run and trigger failover to the next model in the fallback chain or surface an error to the user.
Review the diagnostic heartbeat logs to ensure correct detection of stuck sessions and verify the new setting's effectiveness.

Example

No code snippet is provided as the issue does not contain specific code references, but the suggested fix implies modifying the OpenClaw configuration or implementation to include the sessionMaxRunTimeMs setting.

Notes

The suggested fix assumes that the OpenClaw implementation allows for such a configuration change. It is essential to review the OpenClaw documentation and codebase to ensure that this change is feasible and compatible with the existing architecture.

Recommendation

Apply the suggested workaround by implementing the sessionMaxRunTimeMs setting to prevent indefinite blocking of sessions due to stuck API calls, as this addresses the root cause of the issue and provides a configurable solution for handling such scenarios.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #indexing error #inference speed #output truncation #response parsing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Stuck session not auto-killed — API call hung for 49 minutes blocking Telegram session [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Summary

Timeline

Expected behavior

Current behavior

Suggested fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Stuck session not auto-killed — API call hung for 49 minutes blocking Telegram session [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Summary

Timeline

Expected behavior

Current behavior

Suggested fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING