openclaw - 💡(How to fix) Fix vllm openai-completions streaming parser drops tool_calls when reasoning_content streams first for gpt-oss-120b at large systemPrompt

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

After upgrading to OpenClaw 2026.5.27, multi-tool agents migrated to a local vllm provider serving openai/gpt-oss-120b exhibit a silent failure mode: the model correctly emits structured tool_calls over the wire, but the OpenClaw streaming parser captures the tool-call arguments as a text block instead of a toolCall block. Sessions end with finalStatus: success, toolMetas: 0 entries, itemLifecycle: 0 started, 0 completed — and zero tools execute. The agent's task is effectively skipped silently.

The bug correlates with systemPrompt size: empirically reliable below ~17K chars, probabilistic 17K-22K, reliable failure above ~22K.


Root Cause

Likely root cause

Fix Action

Fix / Workaround

Workaround in place

Code Example

{
    "vllm": {
      "baseUrl": "http://spark-54ff:30001/v1",
      "api": "openai-completions",
      "models": [{
        "id": "openai/gpt-oss-120b",
        "reasoning": true,
        "contextWindow": 60000,
        "maxTokens": 4096
      }],
      "agentRuntime": { "id": "pi" }
    }
  }

---

vllm serve openai/gpt-oss-120b \
    --quantization mxfp4 --mxfp4-backend CUTLASS \
    --tensor-parallel-size 1 \
    --attention-backend FLASHINFER \
    --kv-cache-dtype fp8 \
    --max-model-len 65536 \
    --reasoning-parser openai_gptoss \
    --enable-auto-tool-choice \
    --tool-call-parser openai

---

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "chatcmpl-tool-XYZ",
        "type": "function",
        "function": {
          "name": "read",
          "arguments": "{\"path\": \"/tmp/test.txt\"}"
        }
      }],
      "reasoning_content": "User wants to read file..."
    },
    "finish_reason": "tool_calls"
  }]
}

---

data: {"choices":[{"delta":{"reasoning_content":"User wants to..."}}]}
data: {"choices":[{"delta":{"reasoning_content":" read a file..."}}]}
...
data: {"choices":[{"delta":{"reasoning_content":null,"tool_calls":[{"id":"chatcmpl-tool-XYZ","type":"function","index":0,"function":{"name":"read","arguments":""}}]}}]}
data: {"choices":[{"delta":{"reasoning_content":null,"tool_calls":[{"index":0,"function":{"arguments":"{\n"}}]}}]}
...
data: {"choices":[{"finish_reason":"tool_calls"}]}
data: [DONE]

---

{
  "usage": {"input": 12152, "output": 81, "total": 12233},
  "assistantTexts": ["{\n  \"path\": \"/home/Brad/.openclaw/workspace-athena/IDENTITY.md\"\n}"],
  "messagesSnapshot": [{
    "role": "assistant",
    "content": [
      {"type": "thinking", "thinking": "...", "thinkingSignature": "..."},
      {"type": "text", "text": "{\n  \"path\": \"...\"\n}"}
    ]
  }]
}

---

{
  "toolMetas": [
    {"toolName": "read", "meta": "from ~/.openclaw/workspace-sunny/IDENTITY.md"},
    {"toolName": "write", "meta": "to ~/.openclaw/workspace/org/integrity-reports/sunny-spark-multistep.md"}
  ],
  "itemLifecycle": {"startedCount": 2, "completedCount": 2}
}
RAW_BUFFERClick to expand / collapse

OpenClaw Upstream Issue Draft: vllm openai-completions streaming parser drops tool_calls for gpt-oss-120b at large systemPrompt sizes

Target: https://github.com/openclaw/openclaw/issues Type: Bug report Severity: High (blocks local-inference adoption for multi-tool agents) OpenClaw version: 2026.5.27 Provider: vllm (DGX Spark, local) Model: openai/gpt-oss-120b MXFP4


Title

vllm openai-completions streaming parser drops tool_calls when reasoning_content streams first for gpt-oss-120b at sysprompt >~22K chars


Summary

After upgrading to OpenClaw 2026.5.27, multi-tool agents migrated to a local vllm provider serving openai/gpt-oss-120b exhibit a silent failure mode: the model correctly emits structured tool_calls over the wire, but the OpenClaw streaming parser captures the tool-call arguments as a text block instead of a toolCall block. Sessions end with finalStatus: success, toolMetas: 0 entries, itemLifecycle: 0 started, 0 completed — and zero tools execute. The agent's task is effectively skipped silently.

The bug correlates with systemPrompt size: empirically reliable below ~17K chars, probabilistic 17K-22K, reliable failure above ~22K.


Environment

  • OpenClaw: 2026.5.27 (commit 27ae826)
  • Node: v22.22.0 (managed gateway)
  • vLLM provider config:
    {
      "vllm": {
        "baseUrl": "http://spark-54ff:30001/v1",
        "api": "openai-completions",
        "models": [{
          "id": "openai/gpt-oss-120b",
          "reasoning": true,
          "contextWindow": 60000,
          "maxTokens": 4096
        }],
        "agentRuntime": { "id": "pi" }
      }
    }
  • Spark vllm launch:
    vllm serve openai/gpt-oss-120b \
      --quantization mxfp4 --mxfp4-backend CUTLASS \
      --tensor-parallel-size 1 \
      --attention-backend FLASHINFER \
      --kv-cache-dtype fp8 \
      --max-model-len 65536 \
      --reasoning-parser openai_gptoss \
      --enable-auto-tool-choice \
      --tool-call-parser openai
  • Agent runtime: pi (provider-resolved)
  • streamStrategy: boundary-aware:openai-completions
  • transport: auto

Direct evidence — Spark IS emitting structured tool_calls

Verified via direct curl from VPS to Spark, both streaming and non-streaming responses contain proper structured tool_calls:

Non-streaming response (curl, no OpenClaw)

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "chatcmpl-tool-XYZ",
        "type": "function",
        "function": {
          "name": "read",
          "arguments": "{\"path\": \"/tmp/test.txt\"}"
        }
      }],
      "reasoning_content": "User wants to read file..."
    },
    "finish_reason": "tool_calls"
  }]
}

Streaming chunks (curl, no OpenClaw)

data: {"choices":[{"delta":{"reasoning_content":"User wants to..."}}]}
data: {"choices":[{"delta":{"reasoning_content":" read a file..."}}]}
...
data: {"choices":[{"delta":{"reasoning_content":null,"tool_calls":[{"id":"chatcmpl-tool-XYZ","type":"function","index":0,"function":{"name":"read","arguments":""}}]}}]}
data: {"choices":[{"delta":{"reasoning_content":null,"tool_calls":[{"index":0,"function":{"arguments":"{\n"}}]}}]}
...
data: {"choices":[{"finish_reason":"tool_calls"}]}
data: [DONE]

Conclusion: the Spark side is correct. The bug is in OpenClaw's streaming-chunk handling.


Trajectory evidence of failure

Failed run (Athena, systemPrompt 29,653 chars, model.completed event):

{
  "usage": {"input": 12152, "output": 81, "total": 12233},
  "assistantTexts": ["{\n  \"path\": \"/home/Brad/.openclaw/workspace-athena/IDENTITY.md\"\n}"],
  "messagesSnapshot": [{
    "role": "assistant",
    "content": [
      {"type": "thinking", "thinking": "...", "thinkingSignature": "..."},
      {"type": "text", "text": "{\n  \"path\": \"...\"\n}"}
    ]
  }]
}

Note: tool-call arguments captured as text block (type: "text") rather than toolCall block.

Successful run (Sunny, systemPrompt 21,015 chars, second retry):

{
  "toolMetas": [
    {"toolName": "read", "meta": "from ~/.openclaw/workspace-sunny/IDENTITY.md"},
    {"toolName": "write", "meta": "to ~/.openclaw/workspace/org/integrity-reports/sunny-spark-multistep.md"}
  ],
  "itemLifecycle": {"startedCount": 2, "completedCount": 2}
}

Empirical threshold (5 agent migration attempts on 2026-05-28)

Agentsysprompt charsResultAttempts
Rosie17,324✅ first try1
Apollo20,528❌ silent fail1
Sunny21,015✅ retry2 (auto-retry)
Blue21,207✅ first try1
Athena29,653❌ silent fail1
  • ≤17K chars: first-try reliable
  • 17K-22K chars: probabilistic (sometimes needs retry)
  • ≥22K chars: reliable failure

Likely root cause

dist/openai-transport-stream-Pgx5hpN7.js ~line 3277 — the choiceDelta.tool_calls handler in the streaming parser.

gpt-oss-120b's chunk pattern interleaves reasoning_content deltas FIRST, then tool_calls deltas with delta.reasoning_content: null. The currentBlock state machine may still be appending to a thinking block when the first tool_calls chunk arrives, causing the tool-call args to be queued via queuePostToolCallDelta (line ~3263) or fall through to appendTextDelta.

The failure correlates with sysprompt size because larger system prompts give the model more reasoning runway before emitting the tool call, increasing the number of reasoning_content deltas before the first tool_calls delta arrives.


Proposed fix surface

OpenClaw 2026.5.27 ships a new plugin hook normalizeProviderToolSchemas at dist/plugin-sdk/src/agents/pi-embedded-runner/tool-schema-runtime.d.ts that lets a plugin rewrite tool schemas per-provider at runtime. This is a natural extension point to:

  1. Provider-specific stream pre-handler that resets currentBlock = null before processing tool_calls chunks when the previous block was thinking-only, when provider=vllm and modelId matches openai/gpt-oss-*.
  2. Post-processing recovery that scans text blocks for JSON-shaped tool_call args and re-classifies them as toolCall blocks, gated on finish_reason: tool_calls.

Either path keeps the fix isolated to the vllm + gpt-oss case without disturbing other providers.


Reproduction steps

  1. Stand up vllm 0.10.1+ serving openai/gpt-oss-120b MXFP4 with --tool-call-parser openai --reasoning-parser openai_gptoss --max-model-len 65536.
  2. Configure OpenClaw 2026.5.27 with vllm provider pointing at the endpoint.
  3. Migrate an agent with a >25K-char systemPrompt to vllm/openai/gpt-oss-120b primary.
  4. Spawn a 2-step subagent task (e.g., read file X, then write a summary to file Y).
  5. Observe: session ends status: success but the file is never written. Check trajectory:
    • data.assistantTexts[0] contains raw JSON tool-call args
    • data.toolMetas empty
    • data.itemLifecycle.startedCount = 0
    • data.messagesSnapshot[-1].content contains a text block where a toolCall block was expected

Workaround in place

  • Agents with sysprompt ≤17K migrated (Atlas, Sunny, Blue, Rosie)
  • Agents above threshold held on cloud anthropic
  • ~$1,500-2,000/month inference cost savings deferred until fixed

Acknowledgments

Bug discovered during DGX Spark cutover on 2026-05-28. 5 agent migration attempts provided the empirical threshold curve. Direct curl tests against Spark + source inspection of openai-transport-stream-Pgx5hpN7.js and tool-schema-runtime.d.ts confirmed Spark-side correctness and identified the parser surface as the failure point.

Filed by: Achilles (Brad Cullum's digital ops, srv1436200)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING