openclaw - ✅(Solved) Fix [GPT 5.4 v3 — PR 1/6] Add mandatory tool-use categories for GPT-5 models [1 pull requests, 1 participants]

openclaw2026-04-14 04:45:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#66346•Fetched 2026-04-15 06:26:35

View on GitHub

Comments

Participants

Timeline

Reactions

Author

100yenadmin

Participants

100yenadmin

Timeline (top)

cross-referenced ×3referenced ×1

Fix Action

Fixed

Fixed by PR: feat(openai): add mandatory tool-use categories for GPT-5 models [v3 1/6] (https://github.com/openclaw/openclaw/pull/66371)

PR fix notes

PR #66371: feat(openai): add mandatory tool-use categories for GPT-5 models [v3 1/6]

Repository: openclaw/openclaw
Author: 100yenadmin
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/66371

Description (problem / solution / changelog)

GPT 5.4 Enhancement v3 — PR 1/6

Tracking: #66345 | Issue: #66346 Priority: P0 — CRITICAL | Gap closure: ~35%

Problem

GPT 5.4 on OpenClaw answers factual, computational, and system-state questions from training data instead of calling tools. This is the #1 root cause of the behavioral gap vs Hermes Agent.

               User: "What time is it?"
                        │
          ┌─────────────┴──────────────┐
          │                            │
    ┌─────▼──────┐              ┌──────▼──────┐
    │   Hermes    │              │  OpenClaw    │
    │             │              │  (before)    │
    │ Has:        │              │              │
    │ "NEVER      │              │ No mandatory │
    │  answer     │              │ tool-use     │
    │  from       │              │ categories   │
    │  memory"    │              │              │
    │ + 8 categ.  │              │              │
    └─────┬───────┘              └──────┬───────┘
          │                             │
    ┌─────▼──────┐              ┌───────▼──────┐
    │ Calls       │              │ Answers from  │
    │ `date` tool │              │ training data │
    │ ✅ CORRECT   │              │ ❌ STALE       │
    └─────────────┘              └──────────────┘

Changes

extensions/openai/prompt-overlay.ts:

Added OPENAI_GPT5_TOOL_ENFORCEMENT constant with 8 mandatory tool-use categories:
- Arithmetic/math, hashes/encodings, timestamps, system state, file contents, git, current facts, network checks
Appended to stablePrefix in resolveOpenAISystemPromptContribution

Hermes Reference

agent/prompt_builder.py lines 207-218 — <mandatory_tool_use> block within OPENAI_MODEL_EXECUTION_GUIDANCE.

Verification

Test GPT 5.4 — all should call tools, not answer from memory:

"What time is it?" → date
"What's the sha256 of 'hello'?" → echo -n hello | sha256sum
"How much free disk space?" → df -h
"What branch am I on?" → git branch --show-current
"What's 2^64?" → terminal/code execution

Note

Currently placed in stablePrefix. PR 3 (#66348) will promote tool_enforcement to a first-class sectionOverrides key, at which point this moves from stablePrefix to sectionOverrides.tool_enforcement.

Changed files

extensions/openai/index.test.ts (modified, +19/-7)
extensions/openai/prompt-overlay.ts (modified, +25/-1)

Code Example

User: "What time is it?"
                        │
          ┌─────────────┴──────────────┐
          │                            │
    ┌─────▼──────┐              ┌──────▼──────┐
    │   Hermes    │              │  OpenClaw    │
    │             │              │              │
    │ Has:        │              │ Missing:     │
    │ "NEVER      │              │ No mandatory │
    │  answer     │              │ tool-use     │
    │  from       │              │ categories   │
    │  memory"    │              │              │
    │ + 7 categ.  │              │              │
    └─────┬───────┘              └──────┬───────┘
          │                             │
    ┌─────▼──────┐              ┌───────▼──────┐
    │ Calls       │              │ Answers from  │
    │ `date` tool │              │ training data │
    │ → 2026-04   │              │ → maybe wrong │
    │ ✅           │              │ ❌             │
    └─────────────┘              └──────────────┘

---

OPENAI_MODEL_EXECUTION_GUIDANCE = (
    "<mandatory_tool_use>\n"
    "NEVER answer these from memory or mental computation — ALWAYS use a tool:\n"
    "- Arithmetic, math, calculations → use terminal or execute_code\n"
    "- Hashes, encodings, checksums → use terminal (e.g. sha256sum, base64)\n"
    "- Current time, date, timezone → use terminal (e.g. date)\n"
    "- System state: OS, CPU, memory, disk, ports, processes → use terminal\n"
    "- File contents, sizes, line counts → use read_file, search_files, or terminal\n"
    "- Git history, branches, diffs → use terminal\n"
    "- Current facts (weather, news, versions) → use web_search\n"
    "</mandatory_tool_use>"
)

---

export const OPENAI_GPT5_TOOL_ENFORCEMENT = `## Mandatory Tool Use

NEVER answer these from memory or mental computation — ALWAYS use a tool:
- Arithmetic, math, calculations → use terminal or code execution
- Hashes, encodings, checksums → use terminal (e.g. sha256sum, base64)
- Current time, date, timezone → use terminal (e.g. date)
- System state: OS, CPU, memory, disk, ports, processes → use terminal
- File contents, sizes, line counts → use read or search tools, or terminal
- Git history, branches, diffs, status → use terminal
- Current facts (weather, news, package versions) → use web search
- Network checks (port open, DNS, connectivity) → use terminal

Your training data is stale. The execution environment may differ from what you expect. Always ground answers in live tool output.`;

---

return {
  stablePrefix: [OPENAI_GPT5_OUTPUT_CONTRACT, OPENAI_GPT5_TOOL_CALL_STYLE].join("\n\n"),
  sectionOverrides: {
    execution_bias: OPENAI_GPT5_EXECUTION_BIAS,
    tool_enforcement: OPENAI_GPT5_TOOL_ENFORCEMENT,  // ← NEW
    ...(params.mode === "friendly" ? { interaction_style: OPENAI_FRIENDLY_PROMPT_OVERLAY } : {}),
  },
};

RAW_BUFFERClick to expand / collapse

Parent: #66345 — GPT 5.4 Enhancement v3: Hermes Parity Sprint

Priority: P0 — CRITICAL Estimated gap closure: ~35%

Problem

GPT 5.4 on OpenClaw answers factual, computational, and system-state questions from training data instead of calling tools. This produces hallucinated or stale answers and breaks the agentic flow — the model "thinks" it's done when it hasn't actually verified anything.

Hermes Agent solves this with an explicit mandatory tool-use list that GPT-5 receives in its system prompt. OpenClaw has no equivalent.

               User: "What time is it?"
                        │
          ┌─────────────┴──────────────┐
          │                            │
    ┌─────▼──────┐              ┌──────▼──────┐
    │   Hermes    │              │  OpenClaw    │
    │             │              │              │
    │ Has:        │              │ Missing:     │
    │ "NEVER      │              │ No mandatory │
    │  answer     │              │ tool-use     │
    │  from       │              │ categories   │
    │  memory"    │              │              │
    │ + 7 categ.  │              │              │
    └─────┬───────┘              └──────┬───────┘
          │                             │
    ┌─────▼──────┐              ┌───────▼──────┐
    │ Calls       │              │ Answers from  │
    │ `date` tool │              │ training data │
    │ → 2026-04   │              │ → maybe wrong │
    │ ✅           │              │ ❌             │
    └─────────────┘              └──────────────┘

Hermes Reference

agent/prompt_builder.py lines 207-218:

OPENAI_MODEL_EXECUTION_GUIDANCE = (
    "<mandatory_tool_use>\n"
    "NEVER answer these from memory or mental computation — ALWAYS use a tool:\n"
    "- Arithmetic, math, calculations → use terminal or execute_code\n"
    "- Hashes, encodings, checksums → use terminal (e.g. sha256sum, base64)\n"
    "- Current time, date, timezone → use terminal (e.g. date)\n"
    "- System state: OS, CPU, memory, disk, ports, processes → use terminal\n"
    "- File contents, sizes, line counts → use read_file, search_files, or terminal\n"
    "- Git history, branches, diffs → use terminal\n"
    "- Current facts (weather, news, versions) → use web_search\n"
    "</mandatory_tool_use>"
)

Proposed Implementation

File: `extensions/openai/prompt-overlay.ts`

Add new constant:

export const OPENAI_GPT5_TOOL_ENFORCEMENT = `## Mandatory Tool Use

NEVER answer these from memory or mental computation — ALWAYS use a tool:
- Arithmetic, math, calculations → use terminal or code execution
- Hashes, encodings, checksums → use terminal (e.g. sha256sum, base64)
- Current time, date, timezone → use terminal (e.g. date)
- System state: OS, CPU, memory, disk, ports, processes → use terminal
- File contents, sizes, line counts → use read or search tools, or terminal
- Git history, branches, diffs, status → use terminal
- Current facts (weather, news, package versions) → use web search
- Network checks (port open, DNS, connectivity) → use terminal

Your training data is stale. The execution environment may differ from what you expect. Always ground answers in live tool output.`;

Wire into resolveOpenAISystemPromptContribution:

return {
  stablePrefix: [OPENAI_GPT5_OUTPUT_CONTRACT, OPENAI_GPT5_TOOL_CALL_STYLE].join("\n\n"),
  sectionOverrides: {
    execution_bias: OPENAI_GPT5_EXECUTION_BIAS,
    tool_enforcement: OPENAI_GPT5_TOOL_ENFORCEMENT,  // ← NEW
    ...(params.mode === "friendly" ? { interaction_style: OPENAI_FRIENDLY_PROMPT_OVERLAY } : {}),
  },
};

Note: This uses sectionOverrides.tool_enforcement which requires PR 3 to land first. If shipping independently, place in stablePrefix instead.

File: `extensions/openai/index.test.ts`

Add test verifying GPT-5 models receive tool enforcement, non-GPT-5 models do not.

Verification

Test GPT 5.4 with these prompts (all should call tools, not answer from memory):

"What time is it?" → calls terminal date
"What's the sha256 of 'hello'?" → calls terminal echo -n hello | sha256sum
"How much free disk space?" → calls terminal df -h
"What branch am I on?" → calls terminal git branch --show-current
"What's 2^64?" → calls terminal or code execution

Impact

This is the single highest-ROI change. It closes ~35% of the behavioral gap between OpenClaw and Hermes for GPT 5.4 by eliminating the most common failure mode: answering from stale training data.

extent analysis

TL;DR

To fix the issue of GPT 5.4 answering questions from training data instead of calling tools, add a mandatory tool-use list to the system prompt.

Guidance

Implement the proposed OPENAI_GPT5_TOOL_ENFORCEMENT constant in extensions/openai/prompt-overlay.ts to define the mandatory tool-use list.
Wire the OPENAI_GPT5_TOOL_ENFORCEMENT into resolveOpenAISystemPromptContribution using sectionOverrides.tool_enforcement.
Verify the fix by testing GPT 5.4 with prompts that should call tools, such as "What time is it?" or "What's the sha256 of 'hello'?".
Ensure that the test cases cover various scenarios, including arithmetic, system state, and file contents.

Example

export const OPENAI_GPT5_TOOL_ENFORCEMENT = `## Mandatory Tool Use

NEVER answer these from memory or mental computation — ALWAYS use a tool:
- Arithmetic, math, calculations → use terminal or code execution
- Hashes, encodings, checksums → use terminal (e.g. sha256sum, base64)
- Current time, date, timezone → use terminal (e.g. date)
- System state: OS, CPU, memory, disk, ports, processes → use terminal
- File contents, sizes, line counts → use read or search tools, or terminal
- Git history, branches, diffs, status → use terminal
- Current facts (weather, news, package versions) → use web search
- Network checks (port open, DNS, connectivity) → use terminal

Your training data is stale. The execution environment may differ from what you expect. Always ground answers in live tool output.`;

Notes

The proposed implementation requires PR 3 to land first. If shipping independently, the OPENAI_GPT5_TOOL_ENFORCEMENT should be placed in stablePrefix instead of sectionOverrides.

Recommendation

Apply the proposed workaround by implementing the OPENAI_GPT5_TOOL_ENFORCEMENT constant and wiring it into resolveOpenAISystemPromptContribution. This should fix the issue of GPT 5.4 answering questions from training data instead of calling tools.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#model compatibility #GPU setup #container setup #orchestration issue #cache issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [GPT 5.4 v3 — PR 1/6] Add mandatory tool-use categories for GPT-5 models [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #66371: feat(openai): add mandatory tool-use categories for GPT-5 models [v3 1/6]

Description (problem / solution / changelog)

GPT 5.4 Enhancement v3 — PR 1/6

Problem

Changes

Hermes Reference

Verification

Note

Changed files

Code Example

Parent: #66345 — GPT 5.4 Enhancement v3: Hermes Parity Sprint

Problem

Hermes Reference

Proposed Implementation

File: extensions/openai/prompt-overlay.ts

File: extensions/openai/index.test.ts

Verification

Impact

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

File: `extensions/openai/prompt-overlay.ts`

File: `extensions/openai/index.test.ts`