openclaw - 💡(How to fix) Fix cron denial classifier false-positive on legitimate skipped-helper prose ("I could not run X") [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#83957Fetched 2026-05-20 03:46:05
View on GitHub
Comments
1
Participants
2
Timeline
9
Reactions
1
Author
Timeline (top)
labeled ×7commented ×1unsubscribed ×1

Error Message

The cron denial classifier (detectCronDenialToken in src/cron/isolated-agent/helpers.ts:53) substring-matches could not run / did not run / was denied case-insensitively. This intentionally catches model-narrated denials per the existing test at isolated-agent.helpers.test.ts:17. In production cron use, the broad matching false-fires on legitimate skipped-helper prose, classifying successful runs as status=error. …lands as status=error with diagnostic cron classifier: denial token "could not run" detected in summary, even though: status: error

Root Cause

"Artifact existence was verified. I could not run scripts/X.py because that script is not present in this workspace."

Fix Action

Fix / Workaround

Our local workaround (for context)

Code Example

status: error
delivered: true
summary tail: "Artifact existence was verified. I could not run
scripts/moo-pre-audit-gate.py because that script is not present
in this workspace."
diagnostic: cron classifier: denial token "could not run" detected in summary
RAW_BUFFERClick to expand / collapse

DRAFT — Upstream issue for openclaw/openclaw

Status: drafted 2026-05-19 zubi-opus, NOT yet filed. Auto-classifier blocked the gh issue create as external comms beyond the "do the 2 carry forwards" scope. Awaiting explicit JQ go-ahead.

Target repo: https://github.com/openclaw/openclaw/issues Suggested title: cron denial classifier false-positive on legitimate skipped-helper prose ("I could not run X")


TL;DR

The cron denial classifier (detectCronDenialToken in src/cron/isolated-agent/helpers.ts:53) substring-matches could not run / did not run / was denied case-insensitively. This intentionally catches model-narrated denials per the existing test at isolated-agent.helpers.test.ts:17. In production cron use, the broad matching false-fires on legitimate skipped-helper prose, classifying successful runs as status=error.

Reproduction

A scheduled agentTurn cron whose model summary contains the sentence:

"Artifact existence was verified. I could not run scripts/X.py because that script is not present in this workspace."

…lands as status=error with diagnostic cron classifier: denial token "could not run" detected in summary, even though:

  • The producer's main artifact was written correctly
  • The downstream audit gate ran and passed
  • The model's I could not run … was rationale about an optional helper, not a system denial

Concrete fire from 2026-05-18:

status: error
delivered: true
summary tail: "Artifact existence was verified. I could not run
scripts/moo-pre-audit-gate.py because that script is not present
in this workspace."
diagnostic: cron classifier: denial token "could not run" detected in summary

Same fire delivered to Telegram cleanly, audit succeeded, and 3 signals were promoted downstream. The classifier alone flipped the visible state to red, which then poisoned downstream health surfaces and required manual annotation to close.

Why this is non-trivial

The maintainers' position (per isolated-agent.helpers.test.ts:17) is that any narrated denial — including I could not run Xis a system signal worth catching. There is a real argument for that: a model that volunteers it couldn't perform an action is often telling you something the typed-failure pathway didn't catch.

But the same pattern recurs benignly in prose where the model is describing optional helpers, skipped sub-steps, or rationale for what it kept vs dropped. In our (admittedly external) fleet this pattern crops up once or twice a week and forces manual closeout each time.

Proposed options (not prescribing)

  1. Opt-in strict mode. Add a cron.classifier.strict: true payload flag (or env var) that anchors the case-insensitive tokens to start-of-line/post-bullet/post-colon positions only. Default behaviour unchanged.
  2. Marker convention. Document a [NOT_A_DENIAL] (or similar) sigil the agent can emit alongside legitimate prose to suppress the classifier. Cheap; preserves default semantics; lets operators tune their prompts.
  3. Regex anchoring as default. Change the case-insensitive list to regexes that require the token at start of line / after a colon / after a bullet marker. This breaks the existing test at line 17, so it would need maintainer sign-off on the policy change.
  4. Subject-aware exemption. Exempt the token when immediately preceded by a 1st/2nd/3rd-person subject (I, we, the agent, the cron, the model). Smaller surface change; addresses the specific prose class.

Happy to draft a PR for whichever direction maintainers prefer — wanted to flag the false-positive class and give context before committing to a specific change.

Our local workaround (for context)

We've added a hard "Output discipline" section to the affected cron's payload.message instructing the agent never to use the 5 denial-token substrings in its final response, with concrete substitutions ("I could not run X" → "skipped X (reason: …)"). It works but pushes the constraint to every prompt author rather than letting the classifier be more precise. Hence the upstream ask.

Reference fire details available on request (sessionId + jsonl excerpt) if useful.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING