Same OpenClaw session should resume the same Codex thread across: - Cross-channel calls (Telegram DM → group → cron trigger) - Cross-topic calls (Telegram forum topic A → topic B) - Subagent-spawn calls (direct user → spawned as subagent) - OpenClaw version upgrades (unless the tool contract meaningfully changes)

openclaw - ✅(Solved) Fix Codex harness: runtime context pollutes dynamicToolsFingerprint, causing thread churn on every channel/surface/subagent switch [1 pull requests, 1 participants]

openclaw2026-04-21 23:57:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#69876•Fetched 2026-04-22 07:47:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

richardmqq

Participants

richardmqq

Timeline (top)

cross-referenced ×1

startOrResumeThread in the Codex app-server harness uses dynamicToolsFingerprint to decide whether to resume an existing thread or start a new one. The fingerprint is computed as JSON.stringify(sortedDeepClone(dynamicTools)) — hashing the full tool descriptor array including descriptions and runtime-populated schema enums.

Several tool specs are rebuilt per-request with current runtime context baked in (current channel name, channel-supported actions, subagent-caller flag), so the fingerprint changes between turns of the same OpenClaw session whenever the caller's surface changes. Result: Codex threads are discarded and recreated on every cross-surface interaction, destroying conversation continuity.

Also: any OpenClaw upgrade that tweaks a built-in tool description invalidates every existing Codex thread on the host.

Root Cause

OpenClaw 2026.4.10 (reproduces on current installed version; behavior exists since Codex harness was introduced)
Codex app-server via stdio transport (default)
Issue observed across multiple machines; root cause analyzed with on-disk sidecar evidence

Fix Action

Fix / Workaround

Happy to provide the full 3-sidecar corpus as JSON attachments
Happy to test a patch on the reporter's setup before merge

PR fix notes

PR #69976: fix(codex): ignore tool descriptions in thread fingerprint

Repository: openclaw/openclaw
Author: chen-zhang-cs-code
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/69976

Description (problem / solution / changelog)

Summary

Keep Codex app-server dynamic tool fingerprints focused on the tool call surface by omitting tool descriptions before hashing.
Prevent runtime-local message tool help text changes from invalidating an existing Codex thread binding.
Add regression coverage that description-only changes resume the bound thread while schema/action changes still start a new thread.

Fixes #69876

Test plan

node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/run-attempt.test.ts
pnpm exec oxfmt --check extensions/codex/src/app-server/thread-lifecycle.ts extensions/codex/src/app-server/run-attempt.test.ts
pnpm exec oxlint --tsconfig tsconfig.oxlint.extensions.json extensions/codex/src/app-server/thread-lifecycle.ts extensions/codex/src/app-server/run-attempt.test.ts
git diff --check

Notes

pnpm tsgo:extensions:test currently stops in extensions/qqbot/src/bridge/setup/finalize.ts because @tencent-connect/qqbot-connector is not available in this local install.

Changed files

extensions/codex/src/app-server/run-attempt.test.ts (modified, +108/-0)
extensions/codex/src/app-server/thread-lifecycle.ts (modified, +17/-1)

Code Example

~/.openclaw/agents/<agent-id>/sessions/*.codex-app-server.json

---

session A (OpenClaw version N):    tool_count=41, hash=677be5fb...
session B (OpenClaw version N+1):  tool_count=31, hash=249f9666...
session C (OpenClaw version N+1):  tool_count=31, hash=249f9666...

---

- ...Current channel (telegram) supports: delete, edit, poll, react, send, topic-create, topic-edit.
+ ...Supports actions: send, broadcast, poll, react, delete, edit, topic-create, topic-edit.

---

gateway, sessions_spawn, cron, subagents, session_status,
sessions_send, nodes, sessions_list, agents_list, sessions_history

---

- ...Use process whenever you need logs, status, input, or intervention. Use pty=true...
+ ...Use process whenever you need logs, status, input, or intervention. Do not use exec sleep or delay loops for reminders or deferred follow-ups; use cron instead. Use pty=true...

---

codex app-server dynamic tool catalog changed; starting a new thread

---

function fingerprintDynamicTools(dynamicTools) {
    return JSON.stringify(dynamicTools.map(stabilizeJsonValue));
  }

---

function fingerprintDynamicTools(dynamicTools) {
  return JSON.stringify(
    dynamicTools
      .map(t => ({
        name: t.name,
        shape: extractSchemaShape(t.inputSchema),
      }))
      .toSorted((a, b) => a.name.localeCompare(b.name))
  );
}

function extractSchemaShape(schema) {
  // Recursively strip description, default, enum; keep type, required, properties keys, items shape
  ...
}

RAW_BUFFERClick to expand / collapse

Summary

Also: any OpenClaw upgrade that tweaks a built-in tool description invalidates every existing Codex thread on the host.

Environment

OpenClaw 2026.4.10 (reproduces on current installed version; behavior exists since Codex harness was introduced)
Codex app-server via stdio transport (default)
Issue observed across multiple machines; root cause analyzed with on-disk sidecar evidence

Reproduction (observational, on-disk)

No live reproduction needed — the bug is visible from existing sidecar files.

For any agent that has run under Codex harness across >1 OpenClaw surface or OpenClaw version, inspect:

~/.openclaw/agents/<agent-id>/sessions/*.codex-app-server.json

Each sidecar's dynamicToolsFingerprint is the JSON-stringified tool catalog snapshot at binding time. Diffing two sidecars from the same agent shows non-structural differences:

Example diff (3 sidecars from one bench agent, same machine, different runs)

session A (OpenClaw version N):    tool_count=41, hash=677be5fb...
session B (OpenClaw version N+1):  tool_count=31, hash=249f9666...
session C (OpenClaw version N+1):  tool_count=31, hash=249f9666...

Deltas observed:

1. message tool description embeds current channel name

- ...Current channel (telegram) supports: delete, edit, poll, react, send, topic-create, topic-edit.
+ ...Supports actions: send, broadcast, poll, react, delete, edit, topic-create, topic-edit.

The second form renders when no channel inbound context is present (cron, subagent, internal trigger). Same session, different triggers → different fingerprint.

2. message tool inputSchema size changes with channel: 7395 B vs 6180 B (action enum and allowed fields rebuilt per-channel).

3. Tool count delta: 41 → 31 when isSubagentSessionKey(sessionKey) flips Dropped tools when running as subagent:

gateway, sessions_spawn, cron, subagents, session_status,
sessions_send, nodes, sessions_list, agents_list, sessions_history

4. OpenClaw upgrade changed built-in tool description text

- ...Use process whenever you need logs, status, input, or intervention. Use pty=true...
+ ...Use process whenever you need logs, status, input, or intervention. Do not use exec sleep or delay loops for reminders or deferred follow-ups; use cron instead. Use pty=true...

Every upgrade that touches any built-in tool description invalidates all Codex thread bindings on the host.

Expected behavior

Same OpenClaw session should resume the same Codex thread across:

Cross-channel calls (Telegram DM → group → cron trigger)
Cross-topic calls (Telegram forum topic A → topic B)
Subagent-spawn calls (direct user → spawned as subagent)
OpenClaw version upgrades (unless the tool contract meaningfully changes)

Actual behavior

Fingerprint mismatch → sidecar cleared → thread/start issued → new Codex thread with zero history. No user-visible signal except the "forgotten context" symptom. Only log trace is debug-level:

codex app-server dynamic tool catalog changed; starting a new thread

Relevant code

extensions/codex/src/app-server/thread-lifecycle.ts — startOrResumeThread, fingerprint check gate

extensions/codex/src/app-server/thread-lifecycle.ts (same file) — fingerprintDynamicTools:

function fingerprintDynamicTools(dynamicTools) {
  return JSON.stringify(dynamicTools.map(stabilizeJsonValue));
}

openclaw-tools.ts — buildMessageToolDescription: injects Current channel (${currentChannel}) into tool description
openclaw-tools.ts — buildMessageToolSchema: rebuilds inputSchema per-channel actions set
pi-tools.ts — createOpenClawCodingTools: filters orchestration tools when isSubagentSessionKey(sessionKey) is true

Proposed fix

Option A (minimal, probably sufficient): stabilize fingerprint

Fingerprint only what Codex app-server actually validates across thread/start → thread/resume: tool names + structural shape (property names + required set). Exclude description text and runtime-populated enum values.

function fingerprintDynamicTools(dynamicTools) {
  return JSON.stringify(
    dynamicTools
      .map(t => ({
        name: t.name,
        shape: extractSchemaShape(t.inputSchema),
      }))
      .toSorted((a, b) => a.name.localeCompare(b.name))
  );
}

function extractSchemaShape(schema) {
  // Recursively strip description, default, enum; keep type, required, properties keys, items shape
  ...
}

Semantics: the sidecar should only be invalidated when Codex app-server would genuinely reject thread/resume — i.e., when a previously-known tool is gone, or a tool's argument structure changed in a way the app-server cares about.

Option B (stronger but heavier): compact-on-invalidation

When fingerprint mismatch is detected, call thread/compact/start on the existing thread and seed the resulting summary as developerInstructions on the new thread. This preserves conversational memory across invalidations.

Option C (complementary): scrub runtime context from tool descriptions

buildMessageToolDescription should not bake Current channel (...) into the description. Move runtime context into the system prompt's Environment block (where it already belongs), keeping tool descriptions stable across channels/triggers.

Impact assessment

Any production agent serving multiple surfaces (DM + group + cron + webhook) cannot reliably use Codex harness today — thread churn is deterministic and frequent
Teams currently dismissing Codex harness despite GPT-5.x benchmark advantages because continuity is unworkable
Low-risk fix: Option A alone unblocks most workloads; A+C is ideal

Additional context

Happy to provide the full 3-sidecar corpus as JSON attachments
Happy to test a patch on the reporter's setup before merge

extent analysis

TL;DR

The most likely fix is to stabilize the dynamicToolsFingerprint by excluding description text and runtime-populated enum values, only considering tool names and structural shape.

Guidance

Review the fingerprintDynamicTools function to ensure it only includes relevant information for Codex app-server validation.
Implement the proposed extractSchemaShape function to recursively strip unnecessary properties from the schema.
Consider implementing Option C to scrub runtime context from tool descriptions, keeping them stable across channels and triggers.
Test the fix with the provided 3-sidecar corpus to verify its effectiveness.

Example

function fingerprintDynamicTools(dynamicTools) {
  return JSON.stringify(
    dynamicTools
      .map(t => ({
        name: t.name,
        shape: extractSchemaShape(t.inputSchema),
      }))
      .sort((a, b) => a.name.localeCompare(b.name))
  );
}

function extractSchemaShape(schema) {
  // Recursively strip description, default, enum; keep type, required, properties keys, items shape
  // ...
}

Notes

The proposed fix assumes that the dynamicToolsFingerprint is the primary cause of the issue. However, other factors, such as changes in the OpenClaw version or tool contracts, may still affect the fingerprint. Additional testing and validation may be necessary to ensure the fix works as expected.

Recommendation

Apply the proposed fix (Option A) to stabilize the dynamicToolsFingerprint, as it is the most straightforward and low-risk solution. This should unblock most workloads and provide a reliable foundation for further improvements.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Same OpenClaw session should resume the same Codex thread across:

Cross-channel calls (Telegram DM → group → cron trigger)
Cross-topic calls (Telegram forum topic A → topic B)
Subagent-spawn calls (direct user → spawned as subagent)
OpenClaw version upgrades (unless the tool contract meaningfully changes)

#GPU setup #container setup #orchestration issue #cache issue #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Codex harness: runtime context pollutes dynamicToolsFingerprint, causing thread churn on every channel/surface/subagent switch [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #69976: fix(codex): ignore tool descriptions in thread fingerprint

Description (problem / solution / changelog)

Summary

Test plan

Notes

Changed files

Code Example

Summary

Environment

Reproduction (observational, on-disk)

Example diff (3 sidecars from one bench agent, same machine, different runs)

Expected behavior

Actual behavior

Relevant code

Proposed fix

Option A (minimal, probably sufficient): stabilize fingerprint

Option B (stronger but heavier): compact-on-invalidation

Option C (complementary): scrub runtime context from tool descriptions

Impact assessment

Additional context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING