openclaw - ✅(Solved) Fix LLM calls have no timeout control, slow model responses cause complete agent hang [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#55065Fetched 2026-04-08 01:32:57
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
closed ×1cross-referenced ×1locked ×1subscribed ×1

Error Message

  • If the model doesn't return any token within a specified time, the request should be aborted with a user-friendly error message
  • No error message is shown to the user

Root Cause

Root cause analysis:

  • 12:13:03 ~ 12:17:05: 4 minutes of empty logs
  • This wasn't a `sessions_send` 30s timeout issue
  • The real cause: LLM call got stuck - model returned no tokens
  • The entire lane was occupied, main session couldn't respond to new messages
  • Only `/stop` could recover

Fix Action

Fixed

PR fix notes

PR #55072: feat: add LLM idle timeout for streaming responses

Description (problem / solution / changelog)

Fixes #55065

Summary

Adds an idle timeout mechanism for LLM streaming responses. If the model doesn't return any token within the specified timeout, the request is aborted with a user-friendly error message.

Changes

  1. New configuration option: agents.defaults.llm.idleTimeoutSeconds

    • 0 = disabled (never timeout)
    • > 0 = timeout in seconds
    • Default: 60 seconds
  2. Core implementation: src/agents/pi-embedded-runner/run/llm-idle-timeout.ts

    • resolveLlmIdleTimeoutMs(): resolves timeout from config
    • streamWithIdleTimeout(): wraps stream function with idle timeout using Promise.race
  3. Integration: Modified attempt.ts to wrap streamFn with idle timeout

  4. Tests: llm-idle-timeout.test.ts with 13 test cases covering:

    • Config resolution (8 tests)
    • Stream wrapper behavior (5 tests, including timeout scenario)

Configuration Example

```json { "agents": { "defaults": { "llm": { "idleTimeoutSeconds": 60 } } } } ```

User Experience

Before: Agent hangs indefinitely when LLM is unresponsive, user must use `/stop`

After: After 60s (configurable) of no response, user sees: ``` ⏱️ LLM idle timeout (60s): no response from model ```

Changed files

  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +17/-0)
  • src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts (added, +219/-0)
  • src/agents/pi-embedded-runner/run/llm-idle-timeout.ts (added, +121/-0)
  • src/config/schema.base.generated.ts (modified, +13/-0)
  • src/config/types.agent-defaults.ts (modified, +15/-0)
  • src/config/zod-schema.agent-defaults.ts (modified, +13/-0)

Code Example

12:12:12 - Agent reply completed
12:12:18 - lane=session:agent:main:main waitedMs=25871 (main session waited 26s)
12:13:03 - User sent new message, but blocked (no response)
12:17:05 - User sent /stop (4 minutes later)
12:17:22 - lane=nested waitedMs=299249 (cumulative block: 299 seconds)
RAW_BUFFERClick to expand / collapse

Problem

When an LLM model responds slowly or becomes completely unresponsive, OpenClaw has no timeout mechanism to abort the request. This causes the entire agent lane to be blocked, making the agent unresponsive to any user interaction.

Real-world Incident

Timeline from logs:

12:12:12 - Agent reply completed
12:12:18 - lane=session:agent:main:main waitedMs=25871 (main session waited 26s)
12:13:03 - User sent new message, but blocked (no response)
12:17:05 - User sent /stop (4 minutes later)
12:17:22 - lane=nested waitedMs=299249 (cumulative block: 299 seconds)

Root cause analysis:

  • 12:13:03 ~ 12:17:05: 4 minutes of empty logs
  • This wasn't a `sessions_send` 30s timeout issue
  • The real cause: LLM call got stuck - model returned no tokens
  • The entire lane was occupied, main session couldn't respond to new messages
  • Only `/stop` could recover

Expected Behavior

  • LLM calls should have an idle timeout mechanism
  • If the model doesn't return any token within a specified time, the request should be aborted with a user-friendly error message
  • Agent should remain responsive to user messages

Actual Behavior

  • LLM calls have no timeout limit
  • When model is slow/unresponsive, the entire agent is blocked
  • User can only recover via `/stop`
  • No error message is shown to the user

Impact

  • All LLM providers: Any model can encounter network issues or server-side delays
  • All users: Unpredictable when it happens, extremely poor experience when it does
  • Multi-agent scenarios: One blocked agent may affect communication with other agents

extent analysis

Fix Plan

To introduce a timeout mechanism for LLM calls, we will implement the following steps:

  • Set a timeout limit for LLM requests (e.g., 30 seconds)
  • Use a library like asyncio or timeout-decorator to handle timeouts
  • Catch timeout exceptions and return a user-friendly error message
  • Ensure the agent remains responsive to user messages during LLM requests

Example Code

import asyncio
import timeout_decorator

# Set timeout limit (e.g., 30 seconds)
TIMEOUT_LIMIT = 30

# Decorator to handle timeouts
def timeout_decorator(timeout):
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            try:
                return await asyncio.wait_for(func(*args, **kwargs), timeout=timeout)
            except asyncio.TimeoutError:
                return "LLM request timed out. Please try again later."
        return wrapper
    return decorator

# Apply decorator to LLM call function
@timeout_decorator(TIMEOUT_LIMIT)
async def llm_call(model, input_text):
    # LLM call implementation here
    pass

# Example usage
async def handle_user_message(message):
    try:
        response = await llm_call(model, message)
        return response
    except Exception as e:
        return str(e)

Verification

To verify the fix, test the following scenarios:

  • LLM call with a response within the timeout limit
  • LLM call with no response within the timeout limit
  • Multiple LLM calls with varying response times
  • Agent responsiveness during LLM requests

Extra Tips

  • Adjust the timeout limit based on the average response time of your LLM models
  • Consider implementing a retry mechanism for LLM calls that timeout
  • Monitor agent performance and adjust the timeout limit as needed to ensure a good user experience.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING