openclaw - ✅(Solved) Fix MODELS_JSON_STATE.readyCache permanently cold under traffic — `markAuthProfileUsed` invalidates fingerprint on every successful call [2 pull requests, 2 comments, 3 participants]

Q: Expected behavior

Once `prewarmConfiguredPrimaryModel` populates the cache, subsequent in-process model resolutions for the same configured model should reuse the cached result. Per-message `model-resolution` should be sub-100 ms with a hit.

ilazaridis · 2026-05-10T13:34:19Z

[openclaw] MODELS JSON STATE.readyCache the in-process cache that fronts ensureOpenClawModelsJson is structurally guaranteed to miss on every message for any a… `MODELS_JSON_STATE.readyCache` (the in-process cache that fronts `ensureOpenClawModelsJson`) is structurally guaranteed to miss on every message for any agent that's actively serving traffic. The cache fingerprint includes `auth-profiles.json`'s mtime, and `markAuthProfileUsed` rewrites that file on every successful provider call (just to bump `usageStats.lastUsed`). Net result: per-message `model-resolution` pays the full uncached cost — measured at **6–13 s** in 2026.5.7 with a `kimi/kimi-code` config on a fresh tenant container. # PR #80375: perf: consolidate auth profile success writes - Repository: openclaw/openclaw - Author: mcaxtr - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/80375 ## Description (problem / solution / changelog) Summary - Add markAuthProfileSuccess to record last-good auth profile and successful usage stats in one locked auth-store update. - Use it after successful embedded model runs instead of separate markAuthProfileGood and markAuthProfileUsed writes. - Add coverage for canonical provider alias handling and successful usage-stat reset. Compatibility note - This intentionally removes the old markAuthProfileGood and markAuthProfileUsed exports from the deprecated openclaw/plugin-sdk/agent-runtime barrel. Repo-local usage has moved to markAuthProfileSuccess, and no in-repo imports of the old helper names remain on this PR branch. Verification - git diff --check - pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/agents/auth-profiles/profiles.ts src/agents/auth-profiles.ts src/agents/pi-embedded-runner/run.ts src/agents/auth-profiles/order.test.ts - pnpm test src/agents/auth-profiles/order.test.ts src/agents/auth-profiles/usage.test.ts - pnpm test test/scripts/check-changelog-attributions.test.ts src/infra/changelog-unreleased.test.ts - pnpm tsgo:core ## Changed files - `CHANGELOG.md` (modified, +1/-0) - `src/agents/auth-profiles.ts` (modified, +1/-2) - `src/agents/auth-profiles/order.test.ts` (modified, +23/-8) - `src/agents/auth-profiles/profiles.test.ts` (modified, +96/-54) - `src/agents/auth-profiles/profiles.ts` (modified, +32/-2) - `src/agents/auth-profiles/usage-state.ts` (modified, +1/-1) - `src/agents/auth-profiles/usage.test.ts` (modified, +0/-55) - `src/agents/auth-profiles/usage.ts` (modified, +0/-36) - `src/agents/pi-embedded-runner/run.overflow-compaction.harness.ts` (modified, +1/-2) - `src/agents/pi-embedded-runner/run.ts` (modified, +2/-8) - `src/plugin-sdk/agent-runtime.ts` (modified, +1/-2) --- # PR #73260: perf(models-config): content-hash auth-profiles + models.json drift detection - Repository: openclaw/openclaw - Author: zeroaltitude - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/73260 ## Description (problem / solution / changelog) ## Summary Splits the cache-fingerprint half of #72869 into a standalone PR. This PR replaces mtime-based cache key inputs with content-based hashes for the `ensureOpenClawModelsJson` implicit-provider-discovery cache, plus a second-factor models.json content hash that catches external edits / partial corruption / sibling-process writes between cache hits. Includes the security hardening findings raised by Aisle on the original combined PR. ## Why content hashes The previous mtime-based key invalidated on every OAuth token refresh because `auth-profiles.json` gets rewritten with new access/refresh timestamps even when the set of available providers does not change. Same for `models.json` mtime: the file is the OUTPUT of this function, so each call observed its own write and invalidated the next call. Now: - **auth-profiles.json**: SHA-256 over a stable serialization that strips volatile OAuth session fields (`access`, `refresh`, `expires*`, `issuedAt`, `refreshed/lastChecked/lastRefresh/lastValidatedAt`). Token rotation no longer invalidates; structural changes (added/removed profiles, rotated static `type:token` credentials) still do. - **models.json**: NOT included in the input fingerprint (would cause self-invalidation). Instead its content hash is captured at write time and stored alongside the readyCache entry. Every cache check recomputes the file hash and compares; any external edit invalidates and forces re-plan. ## Security hardening (Aisle review on PR #72869) | Severity | Finding | Fix | |---|---|---| | 🟠 High #1 | CWE-59 symlink-following chmod | `ensureModelsFileMode` now lstats first; refuses to chmod symlinks or non-regular files | | 🟡 Med #3 | CWE-1321 prototype pollution | `Object.create(null)` for stripped result + explicit `__proto__`/`prototype`/`constructor` filter | | 🟡 Med #4 | DoS via unbounded fingerprinting | `MAX_AUTH_PROFILES_BYTES = 8 MiB` (raw-hash above cap), `MAX_AUTH_PROFILES_DEPTH = 64` with depth-marker | | 🟡 Med #5 | CWE-312 secrets in cache | `buildModelsJsonFinger

openclaw2026-05-10 13:34:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#80279•Fetched 2026-05-11 03:16:56

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×2cross-referenced ×2closed ×1

MODELS_JSON_STATE.readyCache (the in-process cache that fronts ensureOpenClawModelsJson) is structurally guaranteed to miss on every message for any agent that's actively serving traffic. The cache fingerprint includes auth-profiles.json's mtime, and markAuthProfileUsed rewrites that file on every successful provider call (just to bump usageStats.lastUsed). Net result: per-message model-resolution pays the full uncached cost — measured at 6–13 s in 2026.5.7 with a kimi/kimi-code config on a fresh tenant container.

Error Message

Observed (agent/embedded warn):

Decouple lastUsed writes from the credential-bearing portion of auth-profiles.json. Persist usageStats in a sibling file (e.g., auth-profiles-usage.json), or skip the file write for lastUsed-only updates and only flush on cooldown / error-state transitions. Either way, the credential-bearing part of auth-profiles.json keeps a stable mtime under steady-state traffic. Image-build patch on markAuthProfileUsed to elide the file write for lastUsed-only updates (return false from the updater unless cooldown / error-state changes). Tested anchor pattern is the same as managed runtime patches that wrap upstream dist/ files; happy to share specifics if useful.

Root Cause

buildModelsJsonFingerprint in dist/models-config-BCL7xtRj.js keys on file mtimes:

async function buildModelsJsonFingerprint(params) {
    const authProfilesMtimeMs = await readFileMtimeMs(path.join(params.agentDir, "auth-profiles.json"));
    const modelsFileMtimeMs   = await readFileMtimeMs(path.join(params.agentDir, "models.json"));
    // …
    return stableStringify({
        config: params.config,
        sourceConfigForSecrets: params.sourceConfigForSecrets,
        envShape,
        authProfilesMtimeMs,   // <-- this
        modelsFileMtimeMs,
        // …
    });
}

markAuthProfileUsed in dist/usage-CQen01xn.js rewrites auth-profiles.json on every successful provider call:

async function markAuthProfileUsed(params) {
    const { store, profileId, agentDir } = params;
    const updated = await authProfileUsageDeps.updateAuthProfileStoreWithLock({
        agentDir,
        updater: (freshStore) => {
            if (!freshStore.profiles[profileId]) return false;
            updateUsageStatsEntry(freshStore, profileId, (existing) =>
                resetUsageStats(existing, { lastUsed: Date.now() }));
            return true;     // <-- triggers saveAuthProfileStore (writes the file)
        }
    });
    // …
}

So the fingerprint is a proxy for "credentials in auth-profiles.json changed" — but markAuthProfileUsed writes the file for a reason that has nothing to do with credentials. The two contracts are individually fine; their interaction is the bug.

The per-message cycle (verified against gateway logs and live stat of auth-profiles.json on a paired tenant):

Embedded run → pi-embedded-*.js calls resolveModelAsync({skipPiDiscovery:true}) with empty discovery stores → returns null for plugin-backed providers → falls back to ensureOpenClawModelsJson.
ensureOpenClawModelsJson reads the current auth-profiles.json mtime → fingerprint differs from any cached entry → cache miss → full re-resolution (runs the plugin's prepareProviderDynamicModel hook, plans the file, writes models.json) — ~6–13 s.
LLM call succeeds → markAuthProfileUsed rewrites auth-profiles.json after the response → next message hits a fresh mtime → goto 2.

Confirmed on a live [email protected] agent: auth-profiles.json mtime advanced past the latest embedded-run timestamp by several seconds, then stayed stable for 60+ s while idle, then advanced again on the next message.

Fix Action

Fix / Workaround

Workaround for downstreams

Image-build patch on markAuthProfileUsed to elide the file write for lastUsed-only updates (return false from the updater unless cooldown / error-state changes). Tested anchor pattern is the same as managed runtime patches that wrap upstream dist/ files; happy to share specifics if useful.

PR fix notes

PR #80375: perf: consolidate auth profile success writes

Repository: openclaw/openclaw
Author: mcaxtr
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/80375

Description (problem / solution / changelog)

Summary

Add markAuthProfileSuccess to record last-good auth profile and successful usage stats in one locked auth-store update.
Use it after successful embedded model runs instead of separate markAuthProfileGood and markAuthProfileUsed writes.
Add coverage for canonical provider alias handling and successful usage-stat reset.

Compatibility note

This intentionally removes the old markAuthProfileGood and markAuthProfileUsed exports from the deprecated openclaw/plugin-sdk/agent-runtime barrel. Repo-local usage has moved to markAuthProfileSuccess, and no in-repo imports of the old helper names remain on this PR branch.

Verification

git diff --check
pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/agents/auth-profiles/profiles.ts src/agents/auth-profiles.ts src/agents/pi-embedded-runner/run.ts src/agents/auth-profiles/order.test.ts
pnpm test src/agents/auth-profiles/order.test.ts src/agents/auth-profiles/usage.test.ts
pnpm test test/scripts/check-changelog-attributions.test.ts src/infra/changelog-unreleased.test.ts
pnpm tsgo:core

Changed files

CHANGELOG.md (modified, +1/-0)
src/agents/auth-profiles.ts (modified, +1/-2)
src/agents/auth-profiles/order.test.ts (modified, +23/-8)
src/agents/auth-profiles/profiles.test.ts (modified, +96/-54)
src/agents/auth-profiles/profiles.ts (modified, +32/-2)
src/agents/auth-profiles/usage-state.ts (modified, +1/-1)
src/agents/auth-profiles/usage.test.ts (modified, +0/-55)
src/agents/auth-profiles/usage.ts (modified, +0/-36)
src/agents/pi-embedded-runner/run.overflow-compaction.harness.ts (modified, +1/-2)
src/agents/pi-embedded-runner/run.ts (modified, +2/-8)
src/plugin-sdk/agent-runtime.ts (modified, +1/-2)

PR #73260: perf(models-config): content-hash auth-profiles + models.json drift detection

Repository: openclaw/openclaw
Author: zeroaltitude
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/73260

Description (problem / solution / changelog)

Summary

Splits the cache-fingerprint half of #72869 into a standalone PR.

This PR replaces mtime-based cache key inputs with content-based hashes for the ensureOpenClawModelsJson implicit-provider-discovery cache, plus a second-factor models.json content hash that catches external edits / partial corruption / sibling-process writes between cache hits. Includes the security hardening findings raised by Aisle on the original combined PR.

Why content hashes

The previous mtime-based key invalidated on every OAuth token refresh because auth-profiles.json gets rewritten with new access/refresh timestamps even when the set of available providers does not change. Same for models.json mtime: the file is the OUTPUT of this function, so each call observed its own write and invalidated the next call.

Now:

auth-profiles.json: SHA-256 over a stable serialization that strips volatile OAuth session fields (access, refresh, expires*, issuedAt, refreshed/lastChecked/lastRefresh/lastValidatedAt). Token rotation no longer invalidates; structural changes (added/removed profiles, rotated static type:token credentials) still do.
models.json: NOT included in the input fingerprint (would cause self-invalidation). Instead its content hash is captured at write time and stored alongside the readyCache entry. Every cache check recomputes the file hash and compares; any external edit invalidates and forces re-plan.

Security hardening (Aisle review on PR #72869)

Severity	Finding	Fix
🟠 High #1	CWE-59 symlink-following chmod	`ensureModelsFileMode` now lstats first; refuses to chmod symlinks or non-regular files
🟡 Med #3	CWE-1321 prototype pollution	`Object.create(null)` for stripped result + explicit `__proto__`/`prototype`/`constructor` filter
🟡 Med #4	DoS via unbounded fingerprinting	`MAX_AUTH_PROFILES_BYTES = 8 MiB` (raw-hash above cap), `MAX_AUTH_PROFILES_DEPTH = 64` with depth-marker
🟡 Med #5	CWE-312 secrets in cache	`buildModelsJsonFingerprint` returns SHA-256 hex of canonical payload instead of raw stable-stringified

Token-rotation correctness

token is intentionally NOT in AUTH_PROFILE_VOLATILE_FIELDS even though OAuth-style token fields rotate. Profiles with type: "token" use the literal token key as a long-lived static credential, and stripping it would mask real auth-state changes when a user rotates that credential. Documented inline.

Tests

Two tests in models-config.fingerprint-cache.test.ts:

does not invalidate the cache when OAuth session fields rotate (rotates access/refresh/expires; cache stays valid)
DOES invalidate the cache when a static type:token credential rotates (rotates the literal token field; cache invalidates)

Existing fingerprint-cache test suite still passes.

Compatibility

MODELS_JSON_STATE.readyCache value shape extended with modelsJsonHash: { fingerprint, modelsJsonHash, result }. All three plan return paths (skip/noop/write) capture the post-write hash. The refreshedFingerprint re-key path forwards modelsJsonHash through unchanged.

Splits out of #72869 (cache-fingerprint half)
The targetProvider short-circuit half is in #73261

Real behavior proof

Behavior or issue addressed: Models-config cache fingerprint fail-closed behavior for unhashable/oversize models.json and auth-profiles.json from PR #73260.
Real environment tested: Local OpenClaw topic branch perf/models-config-cache-fingerprint at 05fda3d839, isolated temporary agent directory, production ensureOpenClawModelsJson invoked through a tsx runtime driver with no Vitest mocks.
Exact steps or command run after this patch: Ran the proof driver after the patch to warm cache on a small auth profile, grow auth-profiles.json past the 8 MiB cap, repeat calls with byte-identical and swapped oversize contents, restore a small file, then exercise oversize and symlinked models.json.

Evidence after fix: Full copied runtime trace is in the PR comment: https://github.com/openclaw/openclaw/pull/73260#issuecomment-4384857383

Excerpt of copied live output:

ensureOpenClawModelsJson warm cache hit: ~1 ms
auth-profiles.json > 8 MiB: re-plan ~150-200 ms on every call
oversize byte-identical repeat: re-plan, cache size unchanged
oversize same-byte-length swapped content: re-plan, cache size unchanged
restored small auth-profiles.json: cache hit restored (~1 ms)
oversize models.json: full plan + rewrite
symlinked models.json: full plan + rewrite

Observed result after fix: Uncacheable models content never matches cached state, oversize auth profiles bypass the ready cache instead of collapsing to a stale fingerprint, and restoring cacheable content re-enables fast hits.
What was not tested: Nothing else for this PR's changed behavior beyond the isolated local runtime proof and supplemental automated validation.

Changed files

CHANGELOG.md (modified, +5/-0)
src/agents/models-config-state.ts (modified, +49/-2)
src/agents/models-config.fingerprint-cache.test.ts (added, +469/-0)
src/agents/models-config.ts (modified, +546/-60)

Code Example

totalMs=6321   stages=… model-resolution:6304ms@6312ms,auth:3ms@…
totalMs=13206  stages=… model-resolution:12873ms@12885ms,auth:2ms@…
totalMs=12486  stages=… model-resolution:12473ms@12480ms,auth:1ms@…

---

async function buildModelsJsonFingerprint(params) {
    const authProfilesMtimeMs = await readFileMtimeMs(path.join(params.agentDir, "auth-profiles.json"));
    const modelsFileMtimeMs   = await readFileMtimeMs(path.join(params.agentDir, "models.json"));
    // …
    return stableStringify({
        config: params.config,
        sourceConfigForSecrets: params.sourceConfigForSecrets,
        envShape,
        authProfilesMtimeMs,   // <-- this
        modelsFileMtimeMs,
        // …
    });
}

---

async function markAuthProfileUsed(params) {
    const { store, profileId, agentDir } = params;
    const updated = await authProfileUsageDeps.updateAuthProfileStoreWithLock({
        agentDir,
        updater: (freshStore) => {
            if (!freshStore.profiles[profileId]) return false;
            updateUsageStatsEntry(freshStore, profileId, (existing) =>
                resetUsageStats(existing, { lastUsed: Date.now() }));
            return true;     // <-- triggers saveAuthProfileStore (writes the file)
        }
    });
    // …
}

RAW_BUFFERClick to expand / collapse

Summary

Version

[email protected] (npm, latest at time of report).

Reproduction

Configure an agent with a plugin-backed provider/model (verified with kimi/kimi-code via api-key auth profile; expected to reproduce for any non-static provider).
Pair the agent and let prewarmConfiguredPrimaryModel complete (logs sidecars.model-prewarm:<n>ms — completes in ~700 ms with a single configured model).
Send three back-to-back messages and look at the embedded-run startup-stages traces.

Observed (agent/embedded warn):

totalMs=6321   stages=… model-resolution:6304ms@6312ms,auth:3ms@…
totalMs=13206  stages=… model-resolution:12873ms@12885ms,auth:2ms@…
totalMs=12486  stages=… model-resolution:12473ms@12480ms,auth:1ms@…

model-resolution accounts for >97 % of startup-stages time on every message; auth, runtime-plugins, hooks, and context-engine are all single-digit ms. Subsequent messages are not faster than the first — the cache is missing every time, not gradually warming.

Root cause

buildModelsJsonFingerprint in dist/models-config-BCL7xtRj.js keys on file mtimes:

async function buildModelsJsonFingerprint(params) {
    const authProfilesMtimeMs = await readFileMtimeMs(path.join(params.agentDir, "auth-profiles.json"));
    const modelsFileMtimeMs   = await readFileMtimeMs(path.join(params.agentDir, "models.json"));
    // …
    return stableStringify({
        config: params.config,
        sourceConfigForSecrets: params.sourceConfigForSecrets,
        envShape,
        authProfilesMtimeMs,   // <-- this
        modelsFileMtimeMs,
        // …
    });
}

markAuthProfileUsed in dist/usage-CQen01xn.js rewrites auth-profiles.json on every successful provider call:

async function markAuthProfileUsed(params) {
    const { store, profileId, agentDir } = params;
    const updated = await authProfileUsageDeps.updateAuthProfileStoreWithLock({
        agentDir,
        updater: (freshStore) => {
            if (!freshStore.profiles[profileId]) return false;
            updateUsageStatsEntry(freshStore, profileId, (existing) =>
                resetUsageStats(existing, { lastUsed: Date.now() }));
            return true;     // <-- triggers saveAuthProfileStore (writes the file)
        }
    });
    // …
}

The per-message cycle (verified against gateway logs and live stat of auth-profiles.json on a paired tenant):

Embedded run → pi-embedded-*.js calls resolveModelAsync({skipPiDiscovery:true}) with empty discovery stores → returns null for plugin-backed providers → falls back to ensureOpenClawModelsJson.
ensureOpenClawModelsJson reads the current auth-profiles.json mtime → fingerprint differs from any cached entry → cache miss → full re-resolution (runs the plugin's prepareProviderDynamicModel hook, plans the file, writes models.json) — ~6–13 s.
LLM call succeeds → markAuthProfileUsed rewrites auth-profiles.json after the response → next message hits a fresh mtime → goto 2.

Expected behavior

Once prewarmConfiguredPrimaryModel populates the cache, subsequent in-process model resolutions for the same configured model should reuse the cached result. Per-message model-resolution should be sub-100 ms with a hit.

Suggested fixes

Two paths, in order of decreasing surgical-ness:

Decouple lastUsed writes from the credential-bearing portion of auth-profiles.json. Persist usageStats in a sibling file (e.g., auth-profiles-usage.json), or skip the file write for lastUsed-only updates and only flush on cooldown / error-state transitions. Either way, the credential-bearing part of auth-profiles.json keeps a stable mtime under steady-state traffic.
Drop authProfilesMtimeMs (and modelsFileMtimeMs) from the fingerprint in favor of a content hash of just the credential-bearing fields (and for models.json, the resolved-model identity). The fingerprint already includes the full params.config; what an mtime check adds is an out-of-band invalidation signal for credential rotation done outside the planner. Replacing the mtime with a content hash of the fields that actually affect model resolution gives the same correctness without false invalidations on usage-stats writes.

(1) is lower-risk and doesn't change the cache invariant. (2) is a deeper fix and would also resolve any other case where the file mtime ticks without semantically meaningful changes.

Workaround for downstreams

Why this is hard to spot

Local single-user development rarely sends 3+ back-to-back messages and instruments the embedded-run startup-stages tracer simultaneously, so the per-message overhead reads as "model is slow" and gets attributed to the provider rather than the cache.
For static providers (anthropic/openai/etc. configured directly in code) the fast path through resolveModelAsync({skipPiDiscovery:true}) succeeds without consulting MODELS_JSON_STATE.readyCache, so the bug is invisible. It only bites plugin-backed providers (e.g., kimi-coding).
The cache exists and is wired up correctly; the symptom doesn't look like a cache bug because prewarm succeeds in <1 s.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #GPU setup #container setup #orchestration issue #cache issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix MODELS_JSON_STATE.readyCache permanently cold under traffic — `markAuthProfileUsed` invalidates fingerprint on every successful call [2 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround for downstreams

PR fix notes

PR #80375: perf: consolidate auth profile success writes

Description (problem / solution / changelog)

Changed files

PR #73260: perf(models-config): content-hash auth-profiles + models.json drift detection

Description (problem / solution / changelog)

Summary

Why content hashes

Security hardening (Aisle review on PR #72869)

Token-rotation correctness

Tests

Compatibility

Related

Real behavior proof

Changed files

Code Example

Summary

Version

Reproduction

Root cause

Expected behavior

Suggested fixes

Workaround for downstreams

Why this is hard to spot

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING