openclaw - ✅(Solved) Fix [Bug]: OpenAI SDK default 10-minute HTTP timeout kills long-running local inference requests [2 pull requests, 1 comments, 2 participants]

openclaw2026-04-09 09:32:23

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#63663•Fetched 2026-04-10 03:42:18

View on GitHub

Comments

Participants

Timeline

Reactions

Author

aidiffuser

Participants

aidiffuser

Artyomkun

Timeline (top)

cross-referenced ×2labeled ×2commented ×1referenced ×1

The bundled openai npm package has a hardcoded DEFAULT_TIMEOUT = 600000 (10 minutes). OpenClaw does not override this when constructing the client for local/custom providers. This causes the HTTP connection to be aborted after exactly 10 minutes, killing any in-flight inference request — even when agents.defaults.timeoutSeconds and llm.idleTimeoutSeconds are configured to allow longer runs.

Root Cause

In the bundled OpenAI SDK at anthropic-vertex-stream-BySayhWO.js, three call sites fall back to 6e5 when no explicit timeout is provided:

// Line 3019
timeout: timeout ?? 6e5,

// Line 3282
timeout: this._client._options.timeout ?? 6e5,

// Line 3987
timeout: timeout ?? 6e5,

This originates from node_modules/openai/src/client.ts:

static DEFAULT_TIMEOUT = 600000; // 10 minutes

OpenClaw never passes a timeout option to the OpenAI client constructor for local providers, so the SDK default wins. The user-configured timeoutSeconds and llm.idleTimeoutSeconds operate at higher abstraction layers (run orchestration and SSE chunk monitoring respectively) and do not propagate down to the SDK's HTTP request timeout.

Fix Action

Fix / Workaround

Workaround for the exec timeout:

Workaround for the SDK timeout

Patch the bundled SDK directly:

PR fix notes

PR #63797: fix(agents): pass run timeoutMs to OpenAI SDK HTTP client to override 10-min default

Repository: openclaw/openclaw
Author: zozo123
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/63797

Description (problem / solution / changelog)

Summary

Pass httpTimeoutMs derived from the agent run's timeoutMs into the OpenAI, AzureOpenAI, and completions client constructors
Prevents the SDK's hardcoded 600 s default from silently terminating local inference requests that exceed 10 minutes
No behavior change for requests completing under 10 minutes

Closes #63663

This PR was developed with AI assistance (Claude). Built with islo.dev

Changed files

src/agents/openai-transport-stream.ts (modified, +18/-1)
src/agents/pi-embedded-runner/run/attempt.ts (modified, +1/-0)
src/agents/pi-embedded-runner/stream-resolution.ts (modified, +17/-3)

PR #63841: fix(agents): pass run timeoutMs to OpenAI SDK HTTP client

Repository: openclaw/openclaw
Author: zozo123
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/63841

Description (problem / solution / changelog)

Summary

The OpenAI SDK defaults to a 600s HTTP timeout that silently terminates long-running agent runs. This fix passes the configured run timeoutMs to the SDK HTTP client to override the default.

Closes #63663

Testing

Relevant tests pass

This PR was developed with AI assistance (Claude). All code has been reviewed and tested. Built with islo.dev

Changed files

src/agents/openai-transport-stream.ts (modified, +18/-1)
src/agents/pi-embedded-runner/run/attempt.ts (modified, +1/-0)
src/agents/pi-embedded-runner/stream-resolution.ts (modified, +17/-3)

Code Example

{
     "agents": {
       "defaults": {
         "timeoutSeconds": 3600,
         "llm": {
           "idleTimeoutSeconds": 0
         }
       }
     }
   }

---

const client = new OpenAI({
  baseURL: provider.baseURL,
  apiKey: provider.apiKey,
  timeout: resolvedTimeoutSeconds * 1000, // propagate to SDK
});

---

[ 2026-04-09 00:12:28.727 | INFO | exo.worker.main:plan_step:167 ] Worker plan: CancelTask

---

[ 23:44:53.694 | DEBUG ] step overhead: 0.10ms (next=80.86ms total=80.96ms)
[ 23:54:55.430 | INFO  ] Worker plan: CancelTask

---

// Line 3019
timeout: timeout ?? 6e5,

// Line 3282
timeout: this._client._options.timeout ?? 6e5,

// Line 3987
timeout: timeout ?? 6e5,

---

static DEFAULT_TIMEOUT = 600000; // 10 minutes

---

[ 2026-04-08 23:44:53.694 | DEBUG | exo.worker.engines.mlx.generator.batch_generate:step:365 ] step overhead: 0.10ms (next=80.86ms total=80.96ms)
[ 2026-04-08 23:54:55.430 | INFO  | exo.worker.main:plan_step:167 ] Worker plan: CancelTask

---

const defaultTimeoutSec = typeof defaults?.timeoutSec === "number" && defaults.timeoutSec > 0 ? defaults.timeoutSec : 1800;

---

timeoutSeconds: z.number().int().positive().optional(),

---

sed -i 's/defaults.timeoutSec > 0 ? defaults.timeoutSec : 1800/defaults.timeoutSec > 0 ? defaults.timeoutSec : 7200/' \
  "$(npm root -g)/openclaw/dist/pi-embedded-DWASRjxE.js"

---

sed -i 's/timeout ?? 6e5/timeout ?? 36e5/g' \
  "$(npm root -g)/openclaw/dist/anthropic-vertex-stream-BySayhWO.js"

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Steps to reproduce

Configure a local provider with a large model where total request time can exceed 10 minutes

Set generous timeouts in openclaw.json:

{
  "agents": {
    "defaults": {
      "timeoutSeconds": 3600,
      "llm": {
        "idleTimeoutSeconds": 0
      }
    }
  }
}

Send a message. The model begins inference and streams tokens normally.
At exactly 10 minutes, the connection is killed — even while tokens are actively streaming

Expected behavior

The OpenAI SDK client timeout should respect the user-configured agents.defaults.timeoutSeconds, or at minimum be configurable separately. When timeoutSeconds: 3600 is set, no layer should abort the request before 3600 seconds.

Suggested fix

When constructing the OpenAI client instance, pass timeout derived from the agent's configured timeoutSeconds:

const client = new OpenAI({
  baseURL: provider.baseURL,
  apiKey: provider.apiKey,
  timeout: resolvedTimeoutSeconds * 1000, // propagate to SDK
});

Alternatively, add a dedicated config key (e.g., llm.httpTimeoutSeconds) that is forwarded to the SDK constructor, with a default that matches timeoutSeconds or is sufficiently large for local inference workloads.

Actual behavior

The inference backend receives a TCP disconnection at exactly 10 minutes after inference begins, regardless of configured timeouts. On exo, this surfaces as:

[ 2026-04-09 00:12:28.727 | INFO | exo.worker.main:plan_step:167 ] Worker plan: CancelTask

Timestamps confirm the 600-second boundary from the first inference step:

[ 23:44:53.694 | DEBUG ] step overhead: 0.10ms (next=80.86ms total=80.96ms)
[ 23:54:55.430 | INFO  ] Worker plan: CancelTask

Delta: 10 minutes 2 seconds (600,000ms + minor overhead). The timer starts when inference begins — meaning the SDK imposes a hard 10-minute wall clock on the entire request, regardless of whether tokens are actively streaming back.

Root cause

In the bundled OpenAI SDK at anthropic-vertex-stream-BySayhWO.js, three call sites fall back to 6e5 when no explicit timeout is provided:

// Line 3019
timeout: timeout ?? 6e5,

// Line 3282
timeout: this._client._options.timeout ?? 6e5,

// Line 3987
timeout: timeout ?? 6e5,

This originates from node_modules/openai/src/client.ts:

static DEFAULT_TIMEOUT = 600000; // 10 minutes

OpenClaw version

v2026.4.5

Operating system

macOS 26.4

Install method

npm global

Model

Kimi-K2.5 (1T parameters, 200k context window)

Provider / routing chain

Local OpenAI-compatible endpoint (exo/MLX, also reproducible with llama-server)

Additional provider/model setup details

2× Mac Studio M3 Ultra (512GB unified memory each), tensor-parallel via exo

Logs, screenshots, and evidence

[ 2026-04-08 23:44:53.694 | DEBUG | exo.worker.engines.mlx.generator.batch_generate:step:365 ] step overhead: 0.10ms (next=80.86ms total=80.96ms)
[ 2026-04-08 23:54:55.430 | INFO  | exo.worker.main:plan_step:167 ] Worker plan: CancelTask

Impact and severity

This affects all users running local inference backends where total request time can exceed 10 minutes. Common scenarios include large models on consumer hardware (1T+ parameter models across multiple machines), large context windows (100k+ tokens), and multi-modal requests with vision models. The existing timeout configuration (timeoutSeconds, llm.idleTimeoutSeconds) gives users the impression they have control, but the SDK's hidden 10-minute ceiling overrides everything silently.

Additional information

Second related bug: `tools.exec.timeoutSeconds` config is silently ignored

The exec tool (bash/shell) has a separate 30-minute default timeout at line 3108 of pi-embedded-DWASRjxE.js:

const defaultTimeoutSec = typeof defaults?.timeoutSec === "number" && defaults.timeoutSec > 0 ? defaults.timeoutSec : 1800;

The code reads defaults.timeoutSec, but the zod schema validates timeoutSeconds:

timeoutSeconds: z.number().int().positive().optional(),

There is no rename mapping between the two. Setting tools.exec.timeoutSeconds in openclaw.json passes validation but never reaches createExecTool() — the 1800s default always wins. This kills long-running tool calls (coding tasks with multiple exec rounds) at exactly 30 minutes.

Workaround for the exec timeout:

sed -i 's/defaults.timeoutSec > 0 ? defaults.timeoutSec : 1800/defaults.timeoutSec > 0 ? defaults.timeoutSec : 7200/' \
  "$(npm root -g)/openclaw/dist/pi-embedded-DWASRjxE.js"

The fix is either rename the schema field to timeoutSec or rename the code to read timeoutSeconds.

Workaround for the SDK timeout

Patch the bundled SDK directly:

sed -i 's/timeout ?? 6e5/timeout ?? 36e5/g' \
  "$(npm root -g)/openclaw/dist/anthropic-vertex-stream-BySayhWO.js"

This changes the fallback from 10 minutes to 1 hour. Both patches must be reapplied after every update.

extent analysis

TL;DR

The OpenAI SDK client timeout can be fixed by passing a derived timeout from the agent's configured timeoutSeconds to the SDK constructor.

Guidance

Identify the timeoutSeconds configuration in openclaw.json and ensure it is set to a value that allows for longer runs (e.g., 3600 seconds).
When constructing the OpenAI client instance, pass timeout derived from the agent's configured timeoutSeconds using the suggested fix: const client = new OpenAI({ ..., timeout: resolvedTimeoutSeconds * 1000 }).
Consider adding a dedicated config key (e.g., llm.httpTimeoutSeconds) that is forwarded to the SDK constructor, with a default that matches timeoutSeconds or is sufficiently large for local inference workloads.
Verify the fix by checking if the inference request is no longer aborted after exactly 10 minutes.

Example

const resolvedTimeoutSeconds = 3600; // example value from openclaw.json
const client = new OpenAI({
  baseURL: provider.baseURL,
  apiKey: provider.apiKey,
  timeout: resolvedTimeoutSeconds * 1000, // propagate to SDK
});

Notes

The suggested fix assumes that the timeoutSeconds configuration is correctly set in openclaw.json.
The workaround for the exec timeout issue is separate and requires modifying the pi-embedded-DWASRjxE.js file or renaming the schema field to timeoutSec.
The fix may need to be reapplied after every update, as the bundled SDK may be overwritten.

Recommendation

Apply the workaround by patching the bundled SDK directly using the provided sed command, or implement the suggested fix by passing the derived timeout to the SDK constructor. This will allow for longer inference runs without being aborted by the 10-minute timeout.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #logging issue #authentication issue #prompt issue #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: OpenAI SDK default 10-minute HTTP timeout kills long-running local inference requests [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workaround for the SDK timeout

PR fix notes

PR #63797: fix(agents): pass run timeoutMs to OpenAI SDK HTTP client to override 10-min default

Description (problem / solution / changelog)

Summary

Changed files

PR #63841: fix(agents): pass run timeoutMs to OpenAI SDK HTTP client

Description (problem / solution / changelog)

Summary

Testing

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Suggested fix

Actual behavior

Root cause

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Second related bug: tools.exec.timeoutSeconds config is silently ignored

Workaround for the SDK timeout

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Second related bug: `tools.exec.timeoutSeconds` config is silently ignored