openclaw - ✅(Solved) Fix [Bug]: OpenAI SDK default 10-minute HTTP timeout kills long-running local inference requests [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63663Fetched 2026-04-10 03:42:18
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Timeline (top)
cross-referenced ×2labeled ×2commented ×1referenced ×1

The bundled openai npm package has a hardcoded DEFAULT_TIMEOUT = 600000 (10 minutes). OpenClaw does not override this when constructing the client for local/custom providers. This causes the HTTP connection to be aborted after exactly 10 minutes, killing any in-flight inference request — even when agents.defaults.timeoutSeconds and llm.idleTimeoutSeconds are configured to allow longer runs.

Root Cause

In the bundled OpenAI SDK at anthropic-vertex-stream-BySayhWO.js, three call sites fall back to 6e5 when no explicit timeout is provided:

// Line 3019
timeout: timeout ?? 6e5,

// Line 3282
timeout: this._client._options.timeout ?? 6e5,

// Line 3987
timeout: timeout ?? 6e5,

This originates from node_modules/openai/src/client.ts:

static DEFAULT_TIMEOUT = 600000; // 10 minutes

OpenClaw never passes a timeout option to the OpenAI client constructor for local providers, so the SDK default wins. The user-configured timeoutSeconds and llm.idleTimeoutSeconds operate at higher abstraction layers (run orchestration and SSE chunk monitoring respectively) and do not propagate down to the SDK's HTTP request timeout.

Fix Action

Fix / Workaround

Workaround for the exec timeout:

Workaround for the SDK timeout

Patch the bundled SDK directly:

PR fix notes

PR #63797: fix(agents): pass run timeoutMs to OpenAI SDK HTTP client to override 10-min default

Description (problem / solution / changelog)

Summary

  • Pass httpTimeoutMs derived from the agent run's timeoutMs into the OpenAI, AzureOpenAI, and completions client constructors
  • Prevents the SDK's hardcoded 600 s default from silently terminating local inference requests that exceed 10 minutes
  • No behavior change for requests completing under 10 minutes

Closes #63663


This PR was developed with AI assistance (Claude). Built with islo.dev

Changed files

  • src/agents/openai-transport-stream.ts (modified, +18/-1)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +1/-0)
  • src/agents/pi-embedded-runner/stream-resolution.ts (modified, +17/-3)

PR #63841: fix(agents): pass run timeoutMs to OpenAI SDK HTTP client

Description (problem / solution / changelog)

Summary

The OpenAI SDK defaults to a 600s HTTP timeout that silently terminates long-running agent runs. This fix passes the configured run timeoutMs to the SDK HTTP client to override the default.

Closes #63663

Testing

  • Relevant tests pass

This PR was developed with AI assistance (Claude). All code has been reviewed and tested. Built with islo.dev

Changed files

  • src/agents/openai-transport-stream.ts (modified, +18/-1)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +1/-0)
  • src/agents/pi-embedded-runner/stream-resolution.ts (modified, +17/-3)

Code Example

{
     "agents": {
       "defaults": {
         "timeoutSeconds": 3600,
         "llm": {
           "idleTimeoutSeconds": 0
         }
       }
     }
   }

---

const client = new OpenAI({
  baseURL: provider.baseURL,
  apiKey: provider.apiKey,
  timeout: resolvedTimeoutSeconds * 1000, // propagate to SDK
});

---

[ 2026-04-09 00:12:28.727 | INFO | exo.worker.main:plan_step:167 ] Worker plan: CancelTask

---

[ 23:44:53.694 | DEBUG ] step overhead: 0.10ms (next=80.86ms total=80.96ms)
[ 23:54:55.430 | INFO  ] Worker plan: CancelTask

---

// Line 3019
timeout: timeout ?? 6e5,

// Line 3282
timeout: this._client._options.timeout ?? 6e5,

// Line 3987
timeout: timeout ?? 6e5,

---

static DEFAULT_TIMEOUT = 600000; // 10 minutes

---

[ 2026-04-08 23:44:53.694 | DEBUG | exo.worker.engines.mlx.generator.batch_generate:step:365 ] step overhead: 0.10ms (next=80.86ms total=80.96ms)
[ 2026-04-08 23:54:55.430 | INFO  | exo.worker.main:plan_step:167 ] Worker plan: CancelTask

---

const defaultTimeoutSec = typeof defaults?.timeoutSec === "number" && defaults.timeoutSec > 0 ? defaults.timeoutSec : 1800;

---

timeoutSeconds: z.number().int().positive().optional(),

---

sed -i 's/defaults.timeoutSec > 0 ? defaults.timeoutSec : 1800/defaults.timeoutSec > 0 ? defaults.timeoutSec : 7200/' \
  "$(npm root -g)/openclaw/dist/pi-embedded-DWASRjxE.js"

---

sed -i 's/timeout ?? 6e5/timeout ?? 36e5/g' \
  "$(npm root -g)/openclaw/dist/anthropic-vertex-stream-BySayhWO.js"
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

The bundled openai npm package has a hardcoded DEFAULT_TIMEOUT = 600000 (10 minutes). OpenClaw does not override this when constructing the client for local/custom providers. This causes the HTTP connection to be aborted after exactly 10 minutes, killing any in-flight inference request — even when agents.defaults.timeoutSeconds and llm.idleTimeoutSeconds are configured to allow longer runs.

Steps to reproduce

  1. Configure a local provider with a large model where total request time can exceed 10 minutes
  2. Set generous timeouts in openclaw.json:
    {
      "agents": {
        "defaults": {
          "timeoutSeconds": 3600,
          "llm": {
            "idleTimeoutSeconds": 0
          }
        }
      }
    }
  3. Send a message. The model begins inference and streams tokens normally.
  4. At exactly 10 minutes, the connection is killed — even while tokens are actively streaming

Expected behavior

The OpenAI SDK client timeout should respect the user-configured agents.defaults.timeoutSeconds, or at minimum be configurable separately. When timeoutSeconds: 3600 is set, no layer should abort the request before 3600 seconds.

Suggested fix

When constructing the OpenAI client instance, pass timeout derived from the agent's configured timeoutSeconds:

const client = new OpenAI({
  baseURL: provider.baseURL,
  apiKey: provider.apiKey,
  timeout: resolvedTimeoutSeconds * 1000, // propagate to SDK
});

Alternatively, add a dedicated config key (e.g., llm.httpTimeoutSeconds) that is forwarded to the SDK constructor, with a default that matches timeoutSeconds or is sufficiently large for local inference workloads.

Actual behavior

The inference backend receives a TCP disconnection at exactly 10 minutes after inference begins, regardless of configured timeouts. On exo, this surfaces as:

[ 2026-04-09 00:12:28.727 | INFO | exo.worker.main:plan_step:167 ] Worker plan: CancelTask

Timestamps confirm the 600-second boundary from the first inference step:

[ 23:44:53.694 | DEBUG ] step overhead: 0.10ms (next=80.86ms total=80.96ms)
[ 23:54:55.430 | INFO  ] Worker plan: CancelTask

Delta: 10 minutes 2 seconds (600,000ms + minor overhead). The timer starts when inference begins — meaning the SDK imposes a hard 10-minute wall clock on the entire request, regardless of whether tokens are actively streaming back.

Root cause

In the bundled OpenAI SDK at anthropic-vertex-stream-BySayhWO.js, three call sites fall back to 6e5 when no explicit timeout is provided:

// Line 3019
timeout: timeout ?? 6e5,

// Line 3282
timeout: this._client._options.timeout ?? 6e5,

// Line 3987
timeout: timeout ?? 6e5,

This originates from node_modules/openai/src/client.ts:

static DEFAULT_TIMEOUT = 600000; // 10 minutes

OpenClaw never passes a timeout option to the OpenAI client constructor for local providers, so the SDK default wins. The user-configured timeoutSeconds and llm.idleTimeoutSeconds operate at higher abstraction layers (run orchestration and SSE chunk monitoring respectively) and do not propagate down to the SDK's HTTP request timeout.

OpenClaw version

v2026.4.5

Operating system

macOS 26.4

Install method

npm global

Model

Kimi-K2.5 (1T parameters, 200k context window)

Provider / routing chain

Local OpenAI-compatible endpoint (exo/MLX, also reproducible with llama-server)

Additional provider/model setup details

2× Mac Studio M3 Ultra (512GB unified memory each), tensor-parallel via exo

Logs, screenshots, and evidence

[ 2026-04-08 23:44:53.694 | DEBUG | exo.worker.engines.mlx.generator.batch_generate:step:365 ] step overhead: 0.10ms (next=80.86ms total=80.96ms)
[ 2026-04-08 23:54:55.430 | INFO  | exo.worker.main:plan_step:167 ] Worker plan: CancelTask

Impact and severity

This affects all users running local inference backends where total request time can exceed 10 minutes. Common scenarios include large models on consumer hardware (1T+ parameter models across multiple machines), large context windows (100k+ tokens), and multi-modal requests with vision models. The existing timeout configuration (timeoutSeconds, llm.idleTimeoutSeconds) gives users the impression they have control, but the SDK's hidden 10-minute ceiling overrides everything silently.

Additional information

Second related bug: tools.exec.timeoutSeconds config is silently ignored

The exec tool (bash/shell) has a separate 30-minute default timeout at line 3108 of pi-embedded-DWASRjxE.js:

const defaultTimeoutSec = typeof defaults?.timeoutSec === "number" && defaults.timeoutSec > 0 ? defaults.timeoutSec : 1800;

The code reads defaults.timeoutSec, but the zod schema validates timeoutSeconds:

timeoutSeconds: z.number().int().positive().optional(),

There is no rename mapping between the two. Setting tools.exec.timeoutSeconds in openclaw.json passes validation but never reaches createExecTool() — the 1800s default always wins. This kills long-running tool calls (coding tasks with multiple exec rounds) at exactly 30 minutes.

Workaround for the exec timeout:

sed -i 's/defaults.timeoutSec > 0 ? defaults.timeoutSec : 1800/defaults.timeoutSec > 0 ? defaults.timeoutSec : 7200/' \
  "$(npm root -g)/openclaw/dist/pi-embedded-DWASRjxE.js"

The fix is either rename the schema field to timeoutSec or rename the code to read timeoutSeconds.

Workaround for the SDK timeout

Patch the bundled SDK directly:

sed -i 's/timeout ?? 6e5/timeout ?? 36e5/g' \
  "$(npm root -g)/openclaw/dist/anthropic-vertex-stream-BySayhWO.js"

This changes the fallback from 10 minutes to 1 hour. Both patches must be reapplied after every update.

extent analysis

TL;DR

The OpenAI SDK client timeout can be fixed by passing a derived timeout from the agent's configured timeoutSeconds to the SDK constructor.

Guidance

  • Identify the timeoutSeconds configuration in openclaw.json and ensure it is set to a value that allows for longer runs (e.g., 3600 seconds).
  • When constructing the OpenAI client instance, pass timeout derived from the agent's configured timeoutSeconds using the suggested fix: const client = new OpenAI({ ..., timeout: resolvedTimeoutSeconds * 1000 }).
  • Consider adding a dedicated config key (e.g., llm.httpTimeoutSeconds) that is forwarded to the SDK constructor, with a default that matches timeoutSeconds or is sufficiently large for local inference workloads.
  • Verify the fix by checking if the inference request is no longer aborted after exactly 10 minutes.

Example

const resolvedTimeoutSeconds = 3600; // example value from openclaw.json
const client = new OpenAI({
  baseURL: provider.baseURL,
  apiKey: provider.apiKey,
  timeout: resolvedTimeoutSeconds * 1000, // propagate to SDK
});

Notes

  • The suggested fix assumes that the timeoutSeconds configuration is correctly set in openclaw.json.
  • The workaround for the exec timeout issue is separate and requires modifying the pi-embedded-DWASRjxE.js file or renaming the schema field to timeoutSec.
  • The fix may need to be reapplied after every update, as the bundled SDK may be overwritten.

Recommendation

Apply the workaround by patching the bundled SDK directly using the provided sed command, or implement the suggested fix by passing the derived timeout to the SDK constructor. This will allow for longer inference runs without being aborted by the 10-minute timeout.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The OpenAI SDK client timeout should respect the user-configured agents.defaults.timeoutSeconds, or at minimum be configurable separately. When timeoutSeconds: 3600 is set, no layer should abort the request before 3600 seconds.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING