openclaw - ✅(Solved) Fix LLM calls have no timeout control, slow model responses cause complete agent hang [1 pull requests, 1 participants]

liuy · 2026-03-26T09:22:52Z

[openclaw] PR 55072: feat: add LLM idle timeout for streaming responses - Repository: openclaw/openclaw - Author: liuy - State: open | merged: False - Link: ht… # PR #55072: feat: add LLM idle timeout for streaming responses - Repository: openclaw/openclaw - Author: liuy - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/55072 ## Description (problem / solution / changelog) Fixes #55065 ### Summary Adds an idle timeout mechanism for LLM streaming responses. If the model doesn't return any token within the specified timeout, the request is aborted with a user-friendly error message. ### Changes 1. **New configuration option**: `agents.defaults.llm.idleTimeoutSeconds` - `0` = disabled (never timeout) - `> 0` = timeout in seconds - Default: 60 seconds 2. **Core implementation**: `src/agents/pi-embedded-runner/run/llm-idle-timeout.ts` - `resolveLlmIdleTimeoutMs()`: resolves timeout from config - `streamWithIdleTimeout()`: wraps stream function with idle timeout using `Promise.race` 3. **Integration**: Modified `attempt.ts` to wrap `streamFn` with idle timeout 4. **Tests**: `llm-idle-timeout.test.ts` with 13 test cases covering: - Config resolution (8 tests) - Stream wrapper behavior (5 tests, including timeout scenario) ### Configuration Example \`\`\`json { "agents": { "defaults": { "llm": { "idleTimeoutSeconds": 60 } } } } \`\`\` ### User Experience **Before**: Agent hangs indefinitely when LLM is unresponsive, user must use \`/stop\` **After**: After 60s (configurable) of no response, user sees: \`\`\` ⏱️ LLM idle timeout (60s): no response from model \`\`\` ## Changed files - `src/agents/pi-embedded-runner/run/attempt.ts` (modified, +17/-0) - `src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts` (added, +219/-0) - `src/agents/pi-embedded-runner/run/llm-idle-timeout.ts` (added, +121/-0) - `src/config/schema.base.generated.ts` (modified, +13/-0) - `src/config/types.agent-defaults.ts` (modified, +15/-0) - `src/config/zod-schema.agent-defaults.ts` (modified, +13/-0) ## Fixed - Fixed by PR: feat: add LLM idle timeout for streaming responses (https://github.com/openclaw/openclaw/pull/55072) ### Problem When an LLM model responds slowly or becomes completely unresponsive, OpenClaw has no timeout mechanism to abort the request. This causes the entire agent lane to be blocked, making the agent unresponsive to any user interaction. ### Real-world Incident **Timeline from logs**: ``` 12:12:12 - Agent reply completed 12:12:18 - lane=session:agent:main:main waitedMs=25871 (main session waited 26s) 12:13:03 - User sent new message, but blocked (no response) 12:17:05 - User sent /stop (4 minutes later) 12:17:22 - lane=nested waitedMs=299249 (cumulative block: 299 seconds) ``` **Root cause analysis**: - 12:13:03 ~ 12:17:05: **4 minutes of empty logs** - This wasn't a \`sessions_send\` 30s timeout issue - **The real cause: LLM call got stuck** - model returned no tokens - The entire lane was occupied, main session couldn't respond to new messages - Only \`/stop\` could recover ### Expected Behavior - LLM calls should have an idle timeout mechanism - If the model doesn't return any token within a specified time, the request should be aborted with a user-friendly error message - Agent should remain responsive to user messages ### Actual Behavior - LLM calls have no timeout limit - When model is slow/unresponsive, the entire agent is blocked - User can only recover via \`/stop\` - No error message is shown to the user ### Impact - **All LLM providers**: Any model can encounter network issues or server-side delays - **All users**: Unpredictable when it happens, extremely poor experience when it does - **Multi-agent scenarios**: One blocked agent may affect communication with other agents

openclaw2026-03-26 09:22:52

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#55065•Fetched 2026-04-08 01:32:57

View on GitHub

Comments

Participants

Timeline

Reactions

Author

liuy

Participants

liuy

Timeline (top)

closed ×1cross-referenced ×1locked ×1subscribed ×1

Error Message

If the model doesn't return any token within a specified time, the request should be aborted with a user-friendly error message
No error message is shown to the user

Root Cause

Root cause analysis:

12:13:03 ~ 12:17:05: 4 minutes of empty logs
This wasn't a `sessions_send` 30s timeout issue
The real cause: LLM call got stuck - model returned no tokens
The entire lane was occupied, main session couldn't respond to new messages
Only `/stop` could recover

Fix Action

Fixed

Fixed by PR: feat: add LLM idle timeout for streaming responses (https://github.com/openclaw/openclaw/pull/55072)

PR fix notes

PR #55072: feat: add LLM idle timeout for streaming responses

Repository: openclaw/openclaw
Author: liuy
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/55072

Description (problem / solution / changelog)

Fixes #55065

Summary

Adds an idle timeout mechanism for LLM streaming responses. If the model doesn't return any token within the specified timeout, the request is aborted with a user-friendly error message.

Changes

New configuration option: agents.defaults.llm.idleTimeoutSeconds
- 0 = disabled (never timeout)
- > 0 = timeout in seconds
- Default: 60 seconds
Core implementation: src/agents/pi-embedded-runner/run/llm-idle-timeout.ts
- resolveLlmIdleTimeoutMs(): resolves timeout from config
- streamWithIdleTimeout(): wraps stream function with idle timeout using Promise.race
Integration: Modified attempt.ts to wrap streamFn with idle timeout
Tests: llm-idle-timeout.test.ts with 13 test cases covering:
- Config resolution (8 tests)
- Stream wrapper behavior (5 tests, including timeout scenario)

Configuration Example

```json { "agents": { "defaults": { "llm": { "idleTimeoutSeconds": 60 } } } } ```

User Experience

Before: Agent hangs indefinitely when LLM is unresponsive, user must use `/stop`

After: After 60s (configurable) of no response, user sees: ``` ⏱️ LLM idle timeout (60s): no response from model ```

Changed files

src/agents/pi-embedded-runner/run/attempt.ts (modified, +17/-0)
src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts (added, +219/-0)
src/agents/pi-embedded-runner/run/llm-idle-timeout.ts (added, +121/-0)
src/config/schema.base.generated.ts (modified, +13/-0)
src/config/types.agent-defaults.ts (modified, +15/-0)
src/config/zod-schema.agent-defaults.ts (modified, +13/-0)

Code Example

12:12:12 - Agent reply completed
12:12:18 - lane=session:agent:main:main waitedMs=25871 (main session waited 26s)
12:13:03 - User sent new message, but blocked (no response)
12:17:05 - User sent /stop (4 minutes later)
12:17:22 - lane=nested waitedMs=299249 (cumulative block: 299 seconds)

RAW_BUFFERClick to expand / collapse

Problem

When an LLM model responds slowly or becomes completely unresponsive, OpenClaw has no timeout mechanism to abort the request. This causes the entire agent lane to be blocked, making the agent unresponsive to any user interaction.

Real-world Incident

Timeline from logs:

12:12:12 - Agent reply completed
12:12:18 - lane=session:agent:main:main waitedMs=25871 (main session waited 26s)
12:13:03 - User sent new message, but blocked (no response)
12:17:05 - User sent /stop (4 minutes later)
12:17:22 - lane=nested waitedMs=299249 (cumulative block: 299 seconds)

Root cause analysis:

12:13:03 ~ 12:17:05: 4 minutes of empty logs
This wasn't a `sessions_send` 30s timeout issue
The real cause: LLM call got stuck - model returned no tokens
The entire lane was occupied, main session couldn't respond to new messages
Only `/stop` could recover

Expected Behavior

LLM calls should have an idle timeout mechanism
If the model doesn't return any token within a specified time, the request should be aborted with a user-friendly error message
Agent should remain responsive to user messages

Actual Behavior

LLM calls have no timeout limit
When model is slow/unresponsive, the entire agent is blocked
User can only recover via `/stop`
No error message is shown to the user

Impact

All LLM providers: Any model can encounter network issues or server-side delays
All users: Unpredictable when it happens, extremely poor experience when it does
Multi-agent scenarios: One blocked agent may affect communication with other agents

extent analysis

Fix Plan

To introduce a timeout mechanism for LLM calls, we will implement the following steps:

Set a timeout limit for LLM requests (e.g., 30 seconds)
Use a library like asyncio or timeout-decorator to handle timeouts
Catch timeout exceptions and return a user-friendly error message
Ensure the agent remains responsive to user messages during LLM requests

Example Code

import asyncio
import timeout_decorator

# Set timeout limit (e.g., 30 seconds)
TIMEOUT_LIMIT = 30

# Decorator to handle timeouts
def timeout_decorator(timeout):
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            try:
                return await asyncio.wait_for(func(*args, **kwargs), timeout=timeout)
            except asyncio.TimeoutError:
                return "LLM request timed out. Please try again later."
        return wrapper
    return decorator

# Apply decorator to LLM call function
@timeout_decorator(TIMEOUT_LIMIT)
async def llm_call(model, input_text):
    # LLM call implementation here
    pass

# Example usage
async def handle_user_message(message):
    try:
        response = await llm_call(model, message)
        return response
    except Exception as e:
        return str(e)

Verification

To verify the fix, test the following scenarios:

LLM call with a response within the timeout limit
LLM call with no response within the timeout limit
Multiple LLM calls with varying response times
Agent responsiveness during LLM requests

Extra Tips

Adjust the timeout limit based on the average response time of your LLM models
Consider implementing a retry mechanism for LLM calls that timeout
Monitor agent performance and adjust the timeout limit as needed to ensure a good user experience.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#network issue #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix LLM calls have no timeout control, slow model responses cause complete agent hang [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #55072: feat: add LLM idle timeout for streaming responses

Description (problem / solution / changelog)

Summary

Changes

Configuration Example

User Experience

Changed files

Code Example

Problem

Real-world Incident

Expected Behavior

Actual Behavior

Impact

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix LLM calls have no timeout control, slow model responses cause complete agent hang [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #55072: feat: add LLM idle timeout for streaming responses

Description (problem / solution / changelog)

Summary

Changes

Configuration Example

User Experience

Changed files

Code Example

Problem

Real-world Incident

Expected Behavior

Actual Behavior

Impact

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING