claude-code - 💡(How to fix) Fix Sonnet 4.6 substitutes `gpt-4.1-mini` for `gpt-5.4-mini` in emitted tool_use commands (~11.5% rate) [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#51417Fetched 2026-04-22 08:02:52
View on GitHub
Comments
1
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×3commented ×1

Claude Sonnet 4.6 rewrites the model-name string gpt-5.4-mini as gpt-4.1-mini in emitted tool_use inputs, even when the source skill / user prompt clearly and repeatedly says gpt-5.4-mini. The substitution happens at generation time — the upstream skill text arrives in context correctly; the downstream tool_use block carries the wrong name.

In the repro, the tool the model is invoking is bramble code-review, a thin CLI that forwards --backend codex --model <name> to the OpenAI Codex endpoint and records findings. It does no string rewriting on the --model value — it just passes the argument through — so OpenAI is the one that returns HTTP 400: "The 'gpt-4.1-mini' model is not supported when using Codex with a ChatGPT account." The bramble layer is not contributing to the bug; it's simply the first place downstream where the wrong name becomes observable.

Error Message

"type": "error", "error": { I have a 5-message redacted JSONL extract from the reproducing session (session id 193fc056-…): original L5 (skill load with correct gpt-5.4-mini), L102 (assistant tool_use Monitor with wrong gpt-4.1-mini), L106 (the OpenAI 400 error event), L107 (terminal task-notification), L108 (next assistant turn). All repo/org/ticket/PR identifiers replaced with placeholders. The --goal argument to bramble is elided since it's not relevant to the bug. Preserved unchanged: Claude model IDs, tool names, the two model-name strings at the heart of the bug, OpenAI error body, token counts, timestamps.

Root Cause

Claude Sonnet 4.6 rewrites the model-name string gpt-5.4-mini as gpt-4.1-mini in emitted tool_use inputs, even when the source skill / user prompt clearly and repeatedly says gpt-5.4-mini. The substitution happens at generation time — the upstream skill text arrives in context correctly; the downstream tool_use block carries the wrong name.

In the repro, the tool the model is invoking is bramble code-review, a thin CLI that forwards --backend codex --model <name> to the OpenAI Codex endpoint and records findings. It does no string rewriting on the --model value — it just passes the argument through — so OpenAI is the one that returns HTTP 400: "The 'gpt-4.1-mini' model is not supported when using Codex with a ChatGPT account." The bramble layer is not contributing to the bug; it's simply the first place downstream where the wrong name becomes observable.

Fix Action

Fix / Workaround

  • Not the skill text. The source markdown consistently says gpt-5.4-mini; verified present in L5 of each repro session.
  • Not an SDK / transport rewrite. Tool inputs are forwarded as-is; no component between the model and the tool dispatcher touches the command string.
  • Not a user typo. The correct name flows through cleanly in the 185 sessions where substitution did not occur.

Code Example

...
    bramble code-review --backend codex --model gpt-5.4-mini \
    --goal "{PR_SUMMARY}" --skip-test-execution \
    --verbose --timeout 10m --envelope-file "$ENVELOPE_CODEX"
...

---

{
  "type": "tool_use",
  "name": "Monitor",
  "input": {
    "description": "bramble codex r1 for PR …",
    "timeout_ms": 720000,
    "persistent": false,
    "command": "WORK_DIR=$(pwd) bramble code-review \\\n  --backend codex --model gpt-4.1-mini \\\n  --goal \"…\" --skip-test-execution \\\n  --verbose --timeout 10m --envelope-file \"$ENVELOPE_CODEX\""
  }
}

---

{
  "type": "error",
  "status": 400,
  "error": {
    "type": "invalid_request_error",
    "message": "The 'gpt-4.1-mini' model is not supported when using Codex with a ChatGPT account."
  }
}
RAW_BUFFERClick to expand / collapse

Sonnet 4.6 substitutes gpt-4.1-mini for gpt-5.4-mini when emitting tool_use commands

Summary

Claude Sonnet 4.6 rewrites the model-name string gpt-5.4-mini as gpt-4.1-mini in emitted tool_use inputs, even when the source skill / user prompt clearly and repeatedly says gpt-5.4-mini. The substitution happens at generation time — the upstream skill text arrives in context correctly; the downstream tool_use block carries the wrong name.

In the repro, the tool the model is invoking is bramble code-review, a thin CLI that forwards --backend codex --model <name> to the OpenAI Codex endpoint and records findings. It does no string rewriting on the --model value — it just passes the argument through — so OpenAI is the one that returns HTTP 400: "The 'gpt-4.1-mini' model is not supported when using Codex with a ChatGPT account." The bramble layer is not contributing to the bug; it's simply the first place downstream where the wrong name becomes observable.

Reproduces on

  • Claude Code CLI: 2.1.116
  • Model: claude-sonnet-4-6 (every real case observed)
  • Tool involved: Monitor (background tool call); the same substitution could plausibly surface in any tool_use that takes a shell command string.

Not observed on Opus 4.7.

Minimal reproduction from one session

Session JSONL (sanitized excerpts below), session id 193fc056-…, agent model claude-sonnet-4-6.

L5 — system/skill load, correct name

...
    bramble code-review --backend codex --model gpt-5.4-mini \
    --goal "{PR_SUMMARY}" --skip-test-execution \
    --verbose --timeout 10m --envelope-file "$ENVELOPE_CODEX"
...

L102 — assistant-emitted tool_use, wrong name

{
  "type": "tool_use",
  "name": "Monitor",
  "input": {
    "description": "bramble codex r1 for PR …",
    "timeout_ms": 720000,
    "persistent": false,
    "command": "WORK_DIR=$(pwd) bramble code-review \\\n  --backend codex --model gpt-4.1-mini \\\n  --goal \"…\" --skip-test-execution \\\n  --verbose --timeout 10m --envelope-file \"$ENVELOPE_CODEX\""
  }
}

L106 — downstream tool result, the 400

{
  "type": "error",
  "status": 400,
  "error": {
    "type": "invalid_request_error",
    "message": "The 'gpt-4.1-mini' model is not supported when using Codex with a ChatGPT account."
  }
}

The substitution is entirely within Claude's generation: the input context has gpt-5.4-mini verbatim, the tool_use output has gpt-4.1-mini.

What I've ruled out

  • Not the skill text. The source markdown consistently says gpt-5.4-mini; verified present in L5 of each repro session.
  • Not an SDK / transport rewrite. Tool inputs are forwarded as-is; no component between the model and the tool dispatcher touches the command string.
  • Not a user typo. The correct name flows through cleanly in the 185 sessions where substitution did not occur.

Likely cause (speculation): gpt-4.1-mini is a real, well-known OpenAI model name in training data; gpt-5.4-mini looks similar but is a bramble-internal alias. The model appears to favor the training-distribution-common identifier when emitting what it treats as a "plausible OpenAI model name" inside a shell command.

Attachment available on request

I have a 5-message redacted JSONL extract from the reproducing session (session id 193fc056-…): original L5 (skill load with correct gpt-5.4-mini), L102 (assistant tool_use Monitor with wrong gpt-4.1-mini), L106 (the OpenAI 400 error event), L107 (terminal task-notification), L108 (next assistant turn).

All repo/org/ticket/PR identifiers replaced with placeholders. The --goal argument to bramble is elided since it's not relevant to the bug. Preserved unchanged: Claude model IDs, tool names, the two model-name strings at the heart of the bug, OpenAI error body, token counts, timestamps.

Happy to share it as a gist or inline excerpt — let me know which works best.

Related repos / context

  • My Claude-SDK wrapper + orchestrator work lives at https://github.com/bazelment/yoloswe (public) — that repo calls the CLI via claude-agent-sdk-shaped interfaces. No repo-side change would prevent this: the substitution is upstream of anything that repo sees.

extent analysis

TL;DR

The issue can be fixed by updating the model to correctly handle the gpt-5.4-mini model name, potentially by adding it to the training data or adjusting the model's string rewriting behavior.

Guidance

  • Verify that the issue is specific to the claude-sonnet-4-6 model by testing other models, such as Opus 4.7, which does not exhibit the same behavior.
  • Investigate the model's training data to determine if gpt-4.1-mini is a well-known model name that is being favored over gpt-5.4-mini.
  • Consider adding gpt-5.4-mini to the model's training data or adjusting the model's string rewriting behavior to correctly handle this model name.
  • Test the bramble code-review tool with other model names to determine if the issue is specific to gpt-5.4-mini or if it is a more general problem.

Example

No code snippet is provided as the issue is related to the model's behavior and not a specific code implementation.

Notes

The issue appears to be specific to the claude-sonnet-4-6 model and may be related to the model's training data. Further investigation is needed to determine the root cause and develop a fix.

Recommendation

Apply a workaround by using a different model, such as Opus 4.7, until the issue with claude-sonnet-4-6 is resolved. This will allow for continued use of the bramble code-review tool while the model issue is being addressed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING