openclaw - 💡(How to fix) Fix vllm openai-completions streaming parser drops tool_calls when reasoning_content streams first for gpt-oss-120b at large systemPrompt

openclaw2026-05-28 15:22:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

After upgrading to OpenClaw 2026.5.27, multi-tool agents migrated to a local vllm provider serving openai/gpt-oss-120b exhibit a silent failure mode: the model correctly emits structured tool_calls over the wire, but the OpenClaw streaming parser captures the tool-call arguments as a text block instead of a toolCall block. Sessions end with finalStatus: success, toolMetas: 0 entries, itemLifecycle: 0 started, 0 completed — and zero tools execute. The agent's task is effectively skipped silently.

The bug correlates with systemPrompt size: empirically reliable below ~17K chars, probabilistic 17K-22K, reliable failure above ~22K.

Root Cause

Likely root cause

Fix Action

Fix / Workaround

Workaround in place

Code Example

{
    "vllm": {
      "baseUrl": "http://spark-54ff:30001/v1",
      "api": "openai-completions",
      "models": [{
        "id": "openai/gpt-oss-120b",
        "reasoning": true,
        "contextWindow": 60000,
        "maxTokens": 4096
      }],
      "agentRuntime": { "id": "pi" }
    }
  }

---

vllm serve openai/gpt-oss-120b \
    --quantization mxfp4 --mxfp4-backend CUTLASS \
    --tensor-parallel-size 1 \
    --attention-backend FLASHINFER \
    --kv-cache-dtype fp8 \
    --max-model-len 65536 \
    --reasoning-parser openai_gptoss \
    --enable-auto-tool-choice \
    --tool-call-parser openai

---

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "chatcmpl-tool-XYZ",
        "type": "function",
        "function": {
          "name": "read",
          "arguments": "{\"path\": \"/tmp/test.txt\"}"
        }
      }],
      "reasoning_content": "User wants to read file..."
    },
    "finish_reason": "tool_calls"
  }]
}

---

data: {"choices":[{"delta":{"reasoning_content":"User wants to..."}}]}
data: {"choices":[{"delta":{"reasoning_content":" read a file..."}}]}
...
data: {"choices":[{"delta":{"reasoning_content":null,"tool_calls":[{"id":"chatcmpl-tool-XYZ","type":"function","index":0,"function":{"name":"read","arguments":""}}]}}]}
data: {"choices":[{"delta":{"reasoning_content":null,"tool_calls":[{"index":0,"function":{"arguments":"{\n"}}]}}]}
...
data: {"choices":[{"finish_reason":"tool_calls"}]}
data: [DONE]

---

{
  "usage": {"input": 12152, "output": 81, "total": 12233},
  "assistantTexts": ["{\n  \"path\": \"/home/Brad/.openclaw/workspace-athena/IDENTITY.md\"\n}"],
  "messagesSnapshot": [{
    "role": "assistant",
    "content": [
      {"type": "thinking", "thinking": "...", "thinkingSignature": "..."},
      {"type": "text", "text": "{\n  \"path\": \"...\"\n}"}
    ]
  }]
}

---

{
  "toolMetas": [
    {"toolName": "read", "meta": "from ~/.openclaw/workspace-sunny/IDENTITY.md"},
    {"toolName": "write", "meta": "to ~/.openclaw/workspace/org/integrity-reports/sunny-spark-multistep.md"}
  ],
  "itemLifecycle": {"startedCount": 2, "completedCount": 2}
}

RAW_BUFFERClick to expand / collapse

OpenClaw Upstream Issue Draft: vllm openai-completions streaming parser drops tool_calls for gpt-oss-120b at large systemPrompt sizes

Target: https://github.com/openclaw/openclaw/issues Type: Bug report Severity: High (blocks local-inference adoption for multi-tool agents) OpenClaw version: 2026.5.27 Provider: vllm (DGX Spark, local) Model: openai/gpt-oss-120b MXFP4

Title

vllm openai-completions streaming parser drops tool_calls when reasoning_content streams first for gpt-oss-120b at sysprompt >~22K chars

Summary

The bug correlates with systemPrompt size: empirically reliable below ~17K chars, probabilistic 17K-22K, reliable failure above ~22K.

Environment

OpenClaw: 2026.5.27 (commit 27ae826)
Node: v22.22.0 (managed gateway)

vLLM provider config:

{
  "vllm": {
    "baseUrl": "http://spark-54ff:30001/v1",
    "api": "openai-completions",
    "models": [{
      "id": "openai/gpt-oss-120b",
      "reasoning": true,
      "contextWindow": 60000,
      "maxTokens": 4096
    }],
    "agentRuntime": { "id": "pi" }
  }
}

Spark vllm launch:

vllm serve openai/gpt-oss-120b \
  --quantization mxfp4 --mxfp4-backend CUTLASS \
  --tensor-parallel-size 1 \
  --attention-backend FLASHINFER \
  --kv-cache-dtype fp8 \
  --max-model-len 65536 \
  --reasoning-parser openai_gptoss \
  --enable-auto-tool-choice \
  --tool-call-parser openai

Agent runtime: pi (provider-resolved)
streamStrategy: boundary-aware:openai-completions
transport: auto

Direct evidence — Spark IS emitting structured tool_calls

Verified via direct curl from VPS to Spark, both streaming and non-streaming responses contain proper structured tool_calls:

Non-streaming response (curl, no OpenClaw)

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "chatcmpl-tool-XYZ",
        "type": "function",
        "function": {
          "name": "read",
          "arguments": "{\"path\": \"/tmp/test.txt\"}"
        }
      }],
      "reasoning_content": "User wants to read file..."
    },
    "finish_reason": "tool_calls"
  }]
}

Streaming chunks (curl, no OpenClaw)

data: {"choices":[{"delta":{"reasoning_content":"User wants to..."}}]}
data: {"choices":[{"delta":{"reasoning_content":" read a file..."}}]}
...
data: {"choices":[{"delta":{"reasoning_content":null,"tool_calls":[{"id":"chatcmpl-tool-XYZ","type":"function","index":0,"function":{"name":"read","arguments":""}}]}}]}
data: {"choices":[{"delta":{"reasoning_content":null,"tool_calls":[{"index":0,"function":{"arguments":"{\n"}}]}}]}
...
data: {"choices":[{"finish_reason":"tool_calls"}]}
data: [DONE]

Conclusion: the Spark side is correct. The bug is in OpenClaw's streaming-chunk handling.

Trajectory evidence of failure

Failed run (Athena, systemPrompt 29,653 chars, model.completed event):

{
  "usage": {"input": 12152, "output": 81, "total": 12233},
  "assistantTexts": ["{\n  \"path\": \"/home/Brad/.openclaw/workspace-athena/IDENTITY.md\"\n}"],
  "messagesSnapshot": [{
    "role": "assistant",
    "content": [
      {"type": "thinking", "thinking": "...", "thinkingSignature": "..."},
      {"type": "text", "text": "{\n  \"path\": \"...\"\n}"}
    ]
  }]
}

Note: tool-call arguments captured as text block (type: "text") rather than toolCall block.

Successful run (Sunny, systemPrompt 21,015 chars, second retry):

{
  "toolMetas": [
    {"toolName": "read", "meta": "from ~/.openclaw/workspace-sunny/IDENTITY.md"},
    {"toolName": "write", "meta": "to ~/.openclaw/workspace/org/integrity-reports/sunny-spark-multistep.md"}
  ],
  "itemLifecycle": {"startedCount": 2, "completedCount": 2}
}

Empirical threshold (5 agent migration attempts on 2026-05-28)

Agent	sysprompt chars	Result	Attempts
Rosie	17,324	✅ first try	1
Apollo	20,528	❌ silent fail	1
Sunny	21,015	✅ retry	2 (auto-retry)
Blue	21,207	✅ first try	1
Athena	29,653	❌ silent fail	1

≤17K chars: first-try reliable
17K-22K chars: probabilistic (sometimes needs retry)
≥22K chars: reliable failure

Likely root cause

dist/openai-transport-stream-Pgx5hpN7.js ~line 3277 — the choiceDelta.tool_calls handler in the streaming parser.

gpt-oss-120b's chunk pattern interleaves reasoning_content deltas FIRST, then tool_calls deltas with delta.reasoning_content: null. The currentBlock state machine may still be appending to a thinking block when the first tool_calls chunk arrives, causing the tool-call args to be queued via queuePostToolCallDelta (line ~3263) or fall through to appendTextDelta.

The failure correlates with sysprompt size because larger system prompts give the model more reasoning runway before emitting the tool call, increasing the number of reasoning_content deltas before the first tool_calls delta arrives.

Proposed fix surface

OpenClaw 2026.5.27 ships a new plugin hook normalizeProviderToolSchemas at dist/plugin-sdk/src/agents/pi-embedded-runner/tool-schema-runtime.d.ts that lets a plugin rewrite tool schemas per-provider at runtime. This is a natural extension point to:

Provider-specific stream pre-handler that resets currentBlock = null before processing tool_calls chunks when the previous block was thinking-only, when provider=vllm and modelId matches openai/gpt-oss-*.
Post-processing recovery that scans text blocks for JSON-shaped tool_call args and re-classifies them as toolCall blocks, gated on finish_reason: tool_calls.

Either path keeps the fix isolated to the vllm + gpt-oss case without disturbing other providers.

Reproduction steps

Stand up vllm 0.10.1+ serving openai/gpt-oss-120b MXFP4 with --tool-call-parser openai --reasoning-parser openai_gptoss --max-model-len 65536.
Configure OpenClaw 2026.5.27 with vllm provider pointing at the endpoint.
Migrate an agent with a >25K-char systemPrompt to vllm/openai/gpt-oss-120b primary.
Spawn a 2-step subagent task (e.g., read file X, then write a summary to file Y).
Observe: session ends status: success but the file is never written. Check trajectory:
- data.assistantTexts[0] contains raw JSON tool-call args
- data.toolMetas empty
- data.itemLifecycle.startedCount = 0
- data.messagesSnapshot[-1].content contains a text block where a toolCall block was expected

Workaround in place

Agents with sysprompt ≤17K migrated (Atlas, Sunny, Blue, Rosie)
Agents above threshold held on cloud anthropic
~$1,500-2,000/month inference cost savings deferred until fixed

Acknowledgments

Bug discovered during DGX Spark cutover on 2026-05-28. 5 agent migration attempts provided the empirical threshold curve. Direct curl tests against Spark + source inspection of openai-transport-stream-Pgx5hpN7.js and tool-schema-runtime.d.ts confirmed Spark-side correctness and identified the parser surface as the failure point.

Filed by: Achilles (Brad Cullum's digital ops, srv1436200)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix vllm openai-completions streaming parser drops tool_calls when reasoning_content streams first for gpt-oss-120b at large systemPrompt

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Likely root cause

Fix Action

Fix / Workaround

Workaround in place

Code Example

OpenClaw Upstream Issue Draft: vllm openai-completions streaming parser drops tool_calls for gpt-oss-120b at large systemPrompt sizes

Title

Summary

Environment

Direct evidence — Spark IS emitting structured tool_calls

Non-streaming response (curl, no OpenClaw)

Streaming chunks (curl, no OpenClaw)

Trajectory evidence of failure

Empirical threshold (5 agent migration attempts on 2026-05-28)

Likely root cause

Proposed fix surface

Reproduction steps

Workaround in place

Acknowledgments

Still need to ship something?

TRENDING