openclaw - 💡(How to fix) Fix [Bug] Ollama cold-start timeout silently exfiltrates data via fallback chain [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#52818Fetched 2026-04-08 01:18:52
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
0
Timeline (top)
commented ×1

When using local Ollama models, the first request after model load triggers a cold-start that takes 13–60+ seconds depending on model size. OpenClaw's default LLM request timeout fires before Ollama can respond, resulting in a timeout-triggered fallback (HTTP 408) to the next model in the fallback chain — typically a cloud provider. User data intended for local processing is silently routed to cloud providers without any user notification or consent. This is a privacy-critical issue.

Error Message

→ HTTP 408 / timeout error | Fallback trigger | Auth failure or explicit model error | Silent timeout-based fallback |

Root Cause

Where the problem lives:

The LLM request timeout is currently global (not per-provider), enforced in src/model/client.ts or equivalent request dispatch layer. When ollama-remote is the first candidate:

Request flow:
  user message → openclaw 
    → LLM request to ollama-remote (qwen3.5:122b)
    → Ollama starts cold-loading model (0 bytes in memory)
    → openclaw waits... timeout fires at T seconds (likely 30s default)
    → HTTP 408 / timeout error
    → fallback chain activated: openai/gpt-4.1-mini
    → user data sent to OpenAI servers  ← SILENT EXFILTRATION

The timeout is enforced before Ollama even finishes loading the model. No distinction is made between:

  1. A genuinely slow/hung model (should fail fast)
  2. A model that just needs more time to load from disk (should wait, or pre-warm)

Code location (estimated):

  • src/infra/model-loader.ts — likely where loadModel() is called, no pre-warm check
  • src/model/providers/ollama.ts — the Ollama provider adapter, likely missing requestTimeout override
  • src/model/client.ts — the fallback chain dispatcher, likely applies global timeout equally to all providers

Fix Action

Fix / Workaround

The LLM request timeout is currently global (not per-provider), enforced in src/model/client.ts or equivalent request dispatch layer. When ollama-remote is the first candidate:

Code location (estimated):

  • src/infra/model-loader.ts — likely where loadModel() is called, no pre-warm check
  • src/model/providers/ollama.ts — the Ollama provider adapter, likely missing requestTimeout override
  • src/model/client.ts — the fallback chain dispatcher, likely applies global timeout equally to all providers
// In ollama provider, before dispatching first request:
async preWarm(modelId: string): Promise<void> {
  const resp = await fetch(`${this.baseUrl}/api/ps`);
  const data = await resp.json();
  if (!data.models?.some(m => m.name === modelId)) {
    // Trigger load without blocking — let it warm in background
    fetch(`${this.baseUrl}/api/generate`, {
      method: 'POST',
      body: JSON.stringify({ model: modelId, prompt: "", stream: false }),
    }).catch(() => {}); // fire-and-forget
  }
}

Code Example

Request flow:
  user message → openclaw 
LLM request to ollama-remote (qwen3.5:122b)
Ollama starts cold-loading model (0 bytes in memory)
    → openclaw waits... timeout fires at T seconds (likely 30s default)
HTTP 408 / timeout error
    → fallback chain activated: openai/gpt-4.1-mini
    → user data sent to OpenAI servers  ← SILENT EXFILTRATION

---

{
  "models": {
    "providers": {
      "ollama-remote": {
        "baseUrl": "http://192.168.178.122:11434",
        "api": "ollama",
        "requestTimeoutMs": 120000,
        "models": [{ "id": "qwen3.5:122b" }]
      }
    }
  }
}

---

// In ollama provider, before dispatching first request:
async preWarm(modelId: string): Promise<void> {
  const resp = await fetch(`${this.baseUrl}/api/ps`);
  const data = await resp.json();
  if (!data.models?.some(m => m.name === modelId)) {
    // Trigger load without blocking — let it warm in background
    fetch(`${this.baseUrl}/api/generate`, {
      method: 'POST',
      body: JSON.stringify({ model: modelId, prompt: "", stream: false }),
    }).catch(() => {}); // fire-and-forget
  }
}
RAW_BUFFERClick to expand / collapse

Bug: Ollama cold-start causes silent data exfiltration via timeout-triggered fallback

Issue type

Performance + Privacy bug

Summary

When using local Ollama models, the first request after model load triggers a cold-start that takes 13–60+ seconds depending on model size. OpenClaw's default LLM request timeout fires before Ollama can respond, resulting in a timeout-triggered fallback (HTTP 408) to the next model in the fallback chain — typically a cloud provider. User data intended for local processing is silently routed to cloud providers without any user notification or consent. This is a privacy-critical issue.

Root Cause Analysis

Where the problem lives:

The LLM request timeout is currently global (not per-provider), enforced in src/model/client.ts or equivalent request dispatch layer. When ollama-remote is the first candidate:

Request flow:
  user message → openclaw 
    → LLM request to ollama-remote (qwen3.5:122b)
    → Ollama starts cold-loading model (0 bytes in memory)
    → openclaw waits... timeout fires at T seconds (likely 30s default)
    → HTTP 408 / timeout error
    → fallback chain activated: openai/gpt-4.1-mini
    → user data sent to OpenAI servers  ← SILENT EXFILTRATION

The timeout is enforced before Ollama even finishes loading the model. No distinction is made between:

  1. A genuinely slow/hung model (should fail fast)
  2. A model that just needs more time to load from disk (should wait, or pre-warm)

Code location (estimated):

  • src/infra/model-loader.ts — likely where loadModel() is called, no pre-warm check
  • src/model/providers/ollama.ts — the Ollama provider adapter, likely missing requestTimeout override
  • src/model/client.ts — the fallback chain dispatcher, likely applies global timeout equally to all providers

Environment

  • Ollama version: 0.5+ (all versions)
  • Model sizes tested: 27B (~13s cold-start), 122B (~46s cold-start)
  • OpenClaw: any version with Ollama provider support

Steps to Reproduce

  1. Configure Ollama provider with a large model: ollama-remote + qwen3.5:122b
  2. Ensure model is NOT pre-loaded: ollama ps returns empty
  3. Send any message through OpenClaw targeting ollama-remote
  4. Observe: timeout fires within ~30s, fallback to cloud model, data sent externally

Expected vs Actual

ExpectedActual
Cold-start durationModel loads (13-60s), then inferenceTimeout at ~30s before model loads
Fallback triggerAuth failure or explicit model errorSilent timeout-based fallback
Data routingStays on local providerSilently routes to cloud
User notificationWarning + consentNone

Proposed Fix (Two-part)

Part 1 — Per-provider requestTimeout override (minimal)

Add requestTimeoutMs to provider config, applied only to that provider's HTTP calls:

{
  "models": {
    "providers": {
      "ollama-remote": {
        "baseUrl": "http://192.168.178.122:11434",
        "api": "ollama",
        "requestTimeoutMs": 120000,
        "models": [{ "id": "qwen3.5:122b" }]
      }
    }
  }
}

Part 2 — Ollama pre-warm endpoint (robust solution)

Add a POST /api/ps or GET /api/tags check before first request to detect if model needs loading, and trigger async pre-warm:

// In ollama provider, before dispatching first request:
async preWarm(modelId: string): Promise<void> {
  const resp = await fetch(`${this.baseUrl}/api/ps`);
  const data = await resp.json();
  if (!data.models?.some(m => m.name === modelId)) {
    // Trigger load without blocking — let it warm in background
    fetch(`${this.baseUrl}/api/generate`, {
      method: 'POST',
      body: JSON.stringify({ model: modelId, prompt: "", stream: false }),
    }).catch(() => {}); // fire-and-forget
  }
}

Privacy Impact

HIGH — If a user configured local-only processing for data sovereignty reasons, a timeout-triggered fallback silently violates that intent. Combined with no user-visible fallback notification, this can expose sensitive data to cloud providers without the user's knowledge or consent.

Labels

bug, performance, privacy, provider: ollama

extent analysis

Fix Plan

To address the silent data exfiltration issue due to timeout-triggered fallback during Ollama's cold-start, we will implement a two-part solution:

  1. Per-provider requestTimeout override:

    • Update the provider configuration to include a requestTimeoutMs parameter specific to each provider.
    • Apply this timeout only to the respective provider's HTTP calls.
  2. Ollama pre-warm endpoint:

    • Introduce a pre-warm mechanism that checks if the model is loaded before dispatching the first request.
    • If the model is not loaded, trigger an asynchronous pre-warm process.

Code Changes

Part 1: Per-provider requestTimeout override

Update src/model/providers/ollama.ts to include the requestTimeoutMs configuration:

interface OllamaProviderConfig {
  baseUrl: string;
  api: string;
  requestTimeoutMs: number; // Add this line
  models: { id: string }[];
}

class OllamaProvider {
  private config: OllamaProviderConfig;

  constructor(config: OllamaProviderConfig) {
    this.config = config;
  }

  async makeRequest(request: any): Promise<any> {
    const timeoutMs = this.config.requestTimeoutMs;
    // Apply the per-provider timeout
    const response = await fetch(this.config.baseUrl, {
      method: 'POST',
      body: JSON.stringify(request),
      timeout: timeoutMs,
    });
    return response.json();
  }
}

Part 2: Ollama pre-warm endpoint

Implement the preWarm method in src/model/providers/ollama.ts:

class OllamaProvider {
  // ...

  async preWarm(modelId: string): Promise<void> {
    const resp = await fetch(`${this.config.baseUrl}/api/ps`);
    const data = await resp.json();
    if (!data.models?.some(m => m.name === modelId)) {
      // Trigger load without blocking — let it warm in background
      fetch(`${this.config.baseUrl}/api/generate`, {
        method: 'POST',
        body: JSON.stringify({ model: modelId, prompt: "", stream: false }),
      }).catch(() => {}); // fire-and-forget
    }
  }

  async makeRequest(request: any): Promise<any> {
    await this.preWarm(request.modelId); // Call preWarm before making the request
    // ...
  }
}

Verification

To verify the fix, follow these steps:

  1. Configure the Ollama provider with a large model and an extended requestTimeoutMs.
  2. Ensure the model is not pre-loaded.
  3. Send a message through OpenClaw targeting the Ollama provider.
  4. Observe that the model loads without triggering a timeout and that no data is sent to cloud providers.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING