openclaw - ✅(Solved) Fix [Bug]: xAI/openai-responses crashes with 422 when tool results include image blocks from read(image) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#57981Fetched 2026-04-08 01:55:18
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Participants
Timeline (top)
labeled ×2closed ×1commented ×1cross-referenced ×1

In OpenClaw 2026.3.28, worker sessions using xAI via the openai-responses API path crash with:

422 Failed to deserialize the JSON body into the target type: input: data did not match any variant of untagged enum ModelInput

Error Message

  1. Next assistant turn fails with xAI 422 ModelInput deserialization error.
  2. Session logs show the worker succeeds through read tool calls, then immediately dies with the 422 xAI error.

Root Cause

In OpenClaw 2026.3.28, worker sessions using xAI via the openai-responses API path crash with:

422 Failed to deserialize the JSON body into the target type: input: data did not match any variant of untagged enum ModelInput

Fix Action

Fixed

PR fix notes

PR #58017: fix(xai): normalize image tool results for responses

Description (problem / solution / changelog)

Summary

  • Problem: xAI openai-responses requests could replay image-bearing tool results as array-valued function_call_output.output, which xAI rejects with a 422 deserialization error.
  • Why it matters: read(image) and similar tool flows could succeed on the tool call itself, then crash on the next model turn instead of continuing the session.
  • What changed: the xAI stream payload compatibility wrapper now rewrites array-valued function_call_output items into string outputs and, when the model explicitly supports image input, emits the image blocks as a following user message instead.
  • What did NOT change (scope boundary): core transcript structures, non-xAI providers, and upstream pi-ai response conversion logic.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #57981
  • Related #57981
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

  • Root cause: the xAI provider uses the OpenAI Responses transport, and the current payload builder can emit image-bearing tool results as function_call_output.output = [input_text, input_image]; xAI rejects that shape.
  • Missing detection / guardrail: the xAI compatibility wrapper stripped unsupported reasoning/schema fields but did not normalize image-bearing tool-result payloads.
  • Prior context (git blame, prior PR, issue, or refactor if known): issue #57981 captured the failing payload shape and 422 symptom for read(image) flows.
  • Why this regressed now: the provider path preserved structured image tool results into the Responses payload, but xAI's endpoint is stricter than direct OpenAI here.
  • If unknown, what was ruled out: verified current code still reproduces the array-valued function_call_output.output payload; this is not already normalized in the current xAI wrapper layer.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/xai/stream.test.ts
  • Scenario the test should lock in: image-bearing tool results sent through the xAI openai-responses wrapper must become string function_call_output.output plus a following user image message.
  • Why this is the smallest reliable guardrail: the bug lives inside the xAI payload compatibility wrapper, so a focused wrapper test exercises the exact failing shape without depending on live provider calls.
  • Existing test that already covers this (if any): none for this payload shape.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • xAI-backed sessions can continue after tool results that include image blocks instead of failing on the follow-up turn with a 422.

Diagram (if applicable)

Before:
[tool returns text + image blocks] -> [function_call_output.output contains input_image] -> [xAI 422]

After:
[tool returns text + image blocks] -> [function_call_output.output becomes text] -> [image blocks replayed as user message] -> [session continues]

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS (local verification) and issue repro from Ubuntu in #57981
  • Runtime/container: local repo checkout
  • Model/provider: xAI / openai-responses
  • Integration/channel (if any): agent session / tool replay
  • Relevant config (redacted): xAI provider using https://api.x.ai/v1

Steps

  1. Configure an xAI model on the openai-responses path.
  2. Produce a tool result with text plus image blocks, such as read on a PNG.
  3. Replay that result into the next model turn.

Expected

  • The follow-up request stays xAI-compatible and the session continues.

Actual

  • Before this change, the request can include array-valued function_call_output.output with input_image, which causes xAI to fail with a 422.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: reproduced the current payload shape locally, added a regression test for the xAI wrapper, ran pnpm test -- extensions/xai/stream.test.ts, ran pnpm test:extension xai, and ran pnpm build.
  • Edge cases checked: tool results with text plus image blocks; models that do not explicitly advertise image input keep a string-only tool output.
  • What you did not verify: live xAI API execution in this environment.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: xAI models without explicit image input support could still receive invalid image blocks.
    • Mitigation: the wrapper only forwards image blocks when the resolved model explicitly declares image input support; otherwise it keeps a string-only tool output.

Notes

  • AI-assisted: yes.
  • pnpm check currently stops on unrelated existing tsgo errors in extensions/diffs/src/language-hints.test.ts on top of origin/main; the touched xAI lane and build both passed.

Made with Cursor

Changed files

  • extensions/xai/stream.test.ts (modified, +174/-0)
  • extensions/xai/stream.ts (modified, +81/-0)

Code Example

1. `openclaw status` shows OpenClaw 2026.3.28 and hook sessions using `grok-4-1-fast`.
2. Installed runtime code indicates xAI provider is built with `openai-responses`:
   - file: `dist/provider-catalog-*.js`
   - function: `buildXaiProvider(api = "openai-responses")`
3. Read-image tool results are normalized/sanitized but still preserved as structured image blocks with base64 payloads:
   - file: `dist/auth-profiles-*.js`
   - `createOpenClawReadTool()` returns `sanitizeToolResultImages(await normalizeReadImageResult(...))`
   - file: `dist/tool-images-*.js`
   - sanitized image blocks remain shaped like:
     - `type: "image"`
     - `data: <base64>`
     - `mimeType: "image/png"` or similar
4. Session logs show the worker succeeds through `read` tool calls, then immediately dies with the 422 xAI error.
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

In OpenClaw 2026.3.28, worker sessions using xAI via the openai-responses API path crash with:

422 Failed to deserialize the JSON body into the target type: input: data did not match any variant of untagged enum ModelInput

Steps to reproduce

  1. Configure OpenClaw 2026.3.28 with xAI/Grok model on the openai-responses path.
  2. Start a session with a prompt like:
    • "Read this PNG file with the read tool, then summarize it."
  3. Call read on any local PNG path.
  4. Observe that the tool result returns an image block.
  5. Next assistant turn fails with xAI 422 ModelInput deserialization error.

Expected behavior

One of the following should happen:

  1. OpenClaw should convert image tool results into an xAI-compatible input shape before forwarding them, or
  2. OpenClaw should down-convert incompatible image tool results to text placeholders/metadata for xAI models that cannot accept the internal image block format, or
  3. OpenClaw should detect incompatibility earlier and avoid sending invalid model input to xAI.

Actual behavior

The session crashes with xAI 422 after image tool results are reintroduced into model input.

OpenClaw version

2026.3.28

Operating system

Ubuntu

Install method

npm

Model

xai grok 4.1 fast

Provider / routing chain

openclaw -> xAI API

Additional provider/model setup details

No response

Logs, screenshots, and evidence

1. `openclaw status` shows OpenClaw 2026.3.28 and hook sessions using `grok-4-1-fast`.
2. Installed runtime code indicates xAI provider is built with `openai-responses`:
   - file: `dist/provider-catalog-*.js`
   - function: `buildXaiProvider(api = "openai-responses")`
3. Read-image tool results are normalized/sanitized but still preserved as structured image blocks with base64 payloads:
   - file: `dist/auth-profiles-*.js`
   - `createOpenClawReadTool()` returns `sanitizeToolResultImages(await normalizeReadImageResult(...))`
   - file: `dist/tool-images-*.js`
   - sanitized image blocks remain shaped like:
     - `type: "image"`
     - `data: <base64>`
     - `mimeType: "image/png"` or similar
4. Session logs show the worker succeeds through `read` tool calls, then immediately dies with the 422 xAI error.

Impact and severity

Breaks any xAI-backed workfloat that read local image files through tools and then continues agent execution.

In my case, it blocked several production workflows after upgrading to 2026.3.28.

Additional information

Last know good version 2026.3.24.

extent analysis

Fix Plan

To resolve the issue, we need to modify the OpenClaw code to handle image tool results before sending them to the xAI API. We can achieve this by converting image blocks to text placeholders or compatible input shapes.

Step-by-Step Solution

  1. Identify the function responsible for sending data to the xAI API: Locate the buildXaiProvider function in dist/provider-catalog-*.js and find the code that sends the request to the openai-responses API.
  2. Add a preprocessing step for image tool results: Before sending the data to the xAI API, add a check to see if the tool result is an image block. If it is, convert it to a text placeholder or a compatible input shape.
  3. Implement the conversion logic: Create a new function, e.g., convertImageBlockToText, that takes the image block as input and returns a text placeholder or a compatible input shape.

Example code:

// dist/provider-catalog-*.js
function buildXaiProvider(api = "openai-responses") {
  // ...
  const sendDataToXai = async (data) => {
    // Check if the data is an image block
    if (data.type === "image") {
      // Convert the image block to a text placeholder or compatible input shape
      data = convertImageBlockToText(data);
    }
    // Send the preprocessed data to the xAI API
    const response = await fetch(api, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify(data),
    });
    // ...
  };
  // ...
}

// dist/tool-images-*.js
function convertImageBlockToText(imageBlock) {
  // Convert the image block to a text placeholder, e.g., "[IMAGE: <image_name>]"
  return `[IMAGE: ${imageBlock.mimeType}]`;
}

Verification

To verify that the fix worked, follow these steps:

  1. Restart the OpenClaw service: Restart the OpenClaw service to apply the changes.
  2. Test the workflow: Test the workflow that was previously failing, and verify that it now completes successfully.
  3. Check the logs: Check the logs to ensure that the xAI API is receiving the preprocessed data and responding correctly.

Extra Tips

  • Make sure to test the fix thoroughly to ensure that it works for all possible image tool results and xAI API requests.
  • Consider adding additional logging or monitoring to detect any future issues with image tool results or xAI API requests.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

One of the following should happen:

  1. OpenClaw should convert image tool results into an xAI-compatible input shape before forwarding them, or
  2. OpenClaw should down-convert incompatible image tool results to text placeholders/metadata for xAI models that cannot accept the internal image block format, or
  3. OpenClaw should detect incompatibility earlier and avoid sending invalid model input to xAI.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: xAI/openai-responses crashes with 422 when tool results include image blocks from read(image) [1 pull requests, 1 comments, 2 participants]