openclaw - ✅(Solved) Fix Codex harness: runtime context pollutes dynamicToolsFingerprint, causing thread churn on every channel/surface/subagent switch [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#69876Fetched 2026-04-22 07:47:02
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

startOrResumeThread in the Codex app-server harness uses dynamicToolsFingerprint to decide whether to resume an existing thread or start a new one. The fingerprint is computed as JSON.stringify(sortedDeepClone(dynamicTools)) — hashing the full tool descriptor array including descriptions and runtime-populated schema enums.

Several tool specs are rebuilt per-request with current runtime context baked in (current channel name, channel-supported actions, subagent-caller flag), so the fingerprint changes between turns of the same OpenClaw session whenever the caller's surface changes. Result: Codex threads are discarded and recreated on every cross-surface interaction, destroying conversation continuity.

Also: any OpenClaw upgrade that tweaks a built-in tool description invalidates every existing Codex thread on the host.

Root Cause

  • OpenClaw 2026.4.10 (reproduces on current installed version; behavior exists since Codex harness was introduced)
  • Codex app-server via stdio transport (default)
  • Issue observed across multiple machines; root cause analyzed with on-disk sidecar evidence

Fix Action

Fix / Workaround

  • Happy to provide the full 3-sidecar corpus as JSON attachments
  • Happy to test a patch on the reporter's setup before merge

PR fix notes

PR #69976: fix(codex): ignore tool descriptions in thread fingerprint

Description (problem / solution / changelog)

Summary

  • Keep Codex app-server dynamic tool fingerprints focused on the tool call surface by omitting tool descriptions before hashing.
  • Prevent runtime-local message tool help text changes from invalidating an existing Codex thread binding.
  • Add regression coverage that description-only changes resume the bound thread while schema/action changes still start a new thread.

Fixes #69876

Test plan

  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts
  • pnpm exec oxfmt --check extensions/codex/src/app-server/thread-lifecycle.ts extensions/codex/src/app-server/run-attempt.test.ts
  • pnpm exec oxlint --tsconfig tsconfig.oxlint.extensions.json extensions/codex/src/app-server/thread-lifecycle.ts extensions/codex/src/app-server/run-attempt.test.ts
  • git diff --check

Notes

  • pnpm tsgo:extensions:test currently stops in extensions/qqbot/src/bridge/setup/finalize.ts because @tencent-connect/qqbot-connector is not available in this local install.

Changed files

  • extensions/codex/src/app-server/run-attempt.test.ts (modified, +108/-0)
  • extensions/codex/src/app-server/thread-lifecycle.ts (modified, +17/-1)

Code Example

~/.openclaw/agents/<agent-id>/sessions/*.codex-app-server.json

---

session A (OpenClaw version N):    tool_count=41, hash=677be5fb...
session B (OpenClaw version N+1):  tool_count=31, hash=249f9666...
session C (OpenClaw version N+1):  tool_count=31, hash=249f9666...

---

- ...Current channel (telegram) supports: delete, edit, poll, react, send, topic-create, topic-edit.
+ ...Supports actions: send, broadcast, poll, react, delete, edit, topic-create, topic-edit.

---

gateway, sessions_spawn, cron, subagents, session_status,
sessions_send, nodes, sessions_list, agents_list, sessions_history

---

- ...Use process whenever you need logs, status, input, or intervention. Use pty=true...
+ ...Use process whenever you need logs, status, input, or intervention. Do not use exec sleep or delay loops for reminders or deferred follow-ups; use cron instead. Use pty=true...

---

codex app-server dynamic tool catalog changed; starting a new thread

---

function fingerprintDynamicTools(dynamicTools) {
    return JSON.stringify(dynamicTools.map(stabilizeJsonValue));
  }

---

function fingerprintDynamicTools(dynamicTools) {
  return JSON.stringify(
    dynamicTools
      .map(t => ({
        name: t.name,
        shape: extractSchemaShape(t.inputSchema),
      }))
      .toSorted((a, b) => a.name.localeCompare(b.name))
  );
}

function extractSchemaShape(schema) {
  // Recursively strip description, default, enum; keep type, required, properties keys, items shape
  ...
}
RAW_BUFFERClick to expand / collapse

Summary

startOrResumeThread in the Codex app-server harness uses dynamicToolsFingerprint to decide whether to resume an existing thread or start a new one. The fingerprint is computed as JSON.stringify(sortedDeepClone(dynamicTools)) — hashing the full tool descriptor array including descriptions and runtime-populated schema enums.

Several tool specs are rebuilt per-request with current runtime context baked in (current channel name, channel-supported actions, subagent-caller flag), so the fingerprint changes between turns of the same OpenClaw session whenever the caller's surface changes. Result: Codex threads are discarded and recreated on every cross-surface interaction, destroying conversation continuity.

Also: any OpenClaw upgrade that tweaks a built-in tool description invalidates every existing Codex thread on the host.

Environment

  • OpenClaw 2026.4.10 (reproduces on current installed version; behavior exists since Codex harness was introduced)
  • Codex app-server via stdio transport (default)
  • Issue observed across multiple machines; root cause analyzed with on-disk sidecar evidence

Reproduction (observational, on-disk)

No live reproduction needed — the bug is visible from existing sidecar files.

For any agent that has run under Codex harness across >1 OpenClaw surface or OpenClaw version, inspect:

~/.openclaw/agents/<agent-id>/sessions/*.codex-app-server.json

Each sidecar's dynamicToolsFingerprint is the JSON-stringified tool catalog snapshot at binding time. Diffing two sidecars from the same agent shows non-structural differences:

Example diff (3 sidecars from one bench agent, same machine, different runs)

session A (OpenClaw version N):    tool_count=41, hash=677be5fb...
session B (OpenClaw version N+1):  tool_count=31, hash=249f9666...
session C (OpenClaw version N+1):  tool_count=31, hash=249f9666...

Deltas observed:

1. message tool description embeds current channel name

- ...Current channel (telegram) supports: delete, edit, poll, react, send, topic-create, topic-edit.
+ ...Supports actions: send, broadcast, poll, react, delete, edit, topic-create, topic-edit.

The second form renders when no channel inbound context is present (cron, subagent, internal trigger). Same session, different triggers → different fingerprint.

2. message tool inputSchema size changes with channel: 7395 B vs 6180 B (action enum and allowed fields rebuilt per-channel).

3. Tool count delta: 41 → 31 when isSubagentSessionKey(sessionKey) flips Dropped tools when running as subagent:

gateway, sessions_spawn, cron, subagents, session_status,
sessions_send, nodes, sessions_list, agents_list, sessions_history

4. OpenClaw upgrade changed built-in tool description text

- ...Use process whenever you need logs, status, input, or intervention. Use pty=true...
+ ...Use process whenever you need logs, status, input, or intervention. Do not use exec sleep or delay loops for reminders or deferred follow-ups; use cron instead. Use pty=true...

Every upgrade that touches any built-in tool description invalidates all Codex thread bindings on the host.

Expected behavior

Same OpenClaw session should resume the same Codex thread across:

  • Cross-channel calls (Telegram DM → group → cron trigger)
  • Cross-topic calls (Telegram forum topic A → topic B)
  • Subagent-spawn calls (direct user → spawned as subagent)
  • OpenClaw version upgrades (unless the tool contract meaningfully changes)

Actual behavior

Fingerprint mismatch → sidecar cleared → thread/start issued → new Codex thread with zero history. No user-visible signal except the "forgotten context" symptom. Only log trace is debug-level:

codex app-server dynamic tool catalog changed; starting a new thread

Relevant code

  • extensions/codex/src/app-server/thread-lifecycle.tsstartOrResumeThread, fingerprint check gate
  • extensions/codex/src/app-server/thread-lifecycle.ts (same file) — fingerprintDynamicTools:
    function fingerprintDynamicTools(dynamicTools) {
      return JSON.stringify(dynamicTools.map(stabilizeJsonValue));
    }
  • openclaw-tools.tsbuildMessageToolDescription: injects Current channel (${currentChannel}) into tool description
  • openclaw-tools.tsbuildMessageToolSchema: rebuilds inputSchema per-channel actions set
  • pi-tools.tscreateOpenClawCodingTools: filters orchestration tools when isSubagentSessionKey(sessionKey) is true

Proposed fix

Option A (minimal, probably sufficient): stabilize fingerprint

Fingerprint only what Codex app-server actually validates across thread/startthread/resume: tool names + structural shape (property names + required set). Exclude description text and runtime-populated enum values.

function fingerprintDynamicTools(dynamicTools) {
  return JSON.stringify(
    dynamicTools
      .map(t => ({
        name: t.name,
        shape: extractSchemaShape(t.inputSchema),
      }))
      .toSorted((a, b) => a.name.localeCompare(b.name))
  );
}

function extractSchemaShape(schema) {
  // Recursively strip description, default, enum; keep type, required, properties keys, items shape
  ...
}

Semantics: the sidecar should only be invalidated when Codex app-server would genuinely reject thread/resume — i.e., when a previously-known tool is gone, or a tool's argument structure changed in a way the app-server cares about.

Option B (stronger but heavier): compact-on-invalidation

When fingerprint mismatch is detected, call thread/compact/start on the existing thread and seed the resulting summary as developerInstructions on the new thread. This preserves conversational memory across invalidations.

Option C (complementary): scrub runtime context from tool descriptions

buildMessageToolDescription should not bake Current channel (...) into the description. Move runtime context into the system prompt's Environment block (where it already belongs), keeping tool descriptions stable across channels/triggers.

Impact assessment

  • Any production agent serving multiple surfaces (DM + group + cron + webhook) cannot reliably use Codex harness today — thread churn is deterministic and frequent
  • Teams currently dismissing Codex harness despite GPT-5.x benchmark advantages because continuity is unworkable
  • Low-risk fix: Option A alone unblocks most workloads; A+C is ideal

Additional context

  • Happy to provide the full 3-sidecar corpus as JSON attachments
  • Happy to test a patch on the reporter's setup before merge

extent analysis

TL;DR

The most likely fix is to stabilize the dynamicToolsFingerprint by excluding description text and runtime-populated enum values, only considering tool names and structural shape.

Guidance

  • Review the fingerprintDynamicTools function to ensure it only includes relevant information for Codex app-server validation.
  • Implement the proposed extractSchemaShape function to recursively strip unnecessary properties from the schema.
  • Consider implementing Option C to scrub runtime context from tool descriptions, keeping them stable across channels and triggers.
  • Test the fix with the provided 3-sidecar corpus to verify its effectiveness.

Example

function fingerprintDynamicTools(dynamicTools) {
  return JSON.stringify(
    dynamicTools
      .map(t => ({
        name: t.name,
        shape: extractSchemaShape(t.inputSchema),
      }))
      .sort((a, b) => a.name.localeCompare(b.name))
  );
}

function extractSchemaShape(schema) {
  // Recursively strip description, default, enum; keep type, required, properties keys, items shape
  // ...
}

Notes

The proposed fix assumes that the dynamicToolsFingerprint is the primary cause of the issue. However, other factors, such as changes in the OpenClaw version or tool contracts, may still affect the fingerprint. Additional testing and validation may be necessary to ensure the fix works as expected.

Recommendation

Apply the proposed fix (Option A) to stabilize the dynamicToolsFingerprint, as it is the most straightforward and low-risk solution. This should unblock most workloads and provide a reliable foundation for further improvements.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Same OpenClaw session should resume the same Codex thread across:

  • Cross-channel calls (Telegram DM → group → cron trigger)
  • Cross-topic calls (Telegram forum topic A → topic B)
  • Subagent-spawn calls (direct user → spawned as subagent)
  • OpenClaw version upgrades (unless the tool contract meaningfully changes)

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Codex harness: runtime context pollutes dynamicToolsFingerprint, causing thread churn on every channel/surface/subagent switch [1 pull requests, 1 participants]