codex - 💡(How to fix) Fix Feature idea: extend /goal with intent calibration, evidence chains, and side-thread tolerance [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#20958Fetched 2026-05-05 05:55:32
View on GitHub
Comments
2
Participants
3
Timeline
11
Reactions
0
Timeline (top)
labeled ×5unlabeled ×4commented ×2

Root Cause

This is useful because users often cannot describe the correct goal until Codex has helped explore the space.

RAW_BUFFERClick to expand / collapse

What variant of Codex are you using?

Codex CLI with local skills and hooks enabled.

What feature would you like to see?

I would like /goal to grow an optional long-run harness layer inspired by a userland Codex skill I have been building: Long Long Run (LLR).

LLR repo: https://github.com/huahuadeliaoliao/long-long-run

Origin story / design notes: https://github.com/huahuadeliaoliao/long-long-run/blob/main/docs/why-long-long-run.md

Codex 0.128.0's persisted /goal workflows are a strong foundation for long-running work. In practice, though, the hard part is often not only remembering that a goal exists. The harder parts are:

  • helping the user clarify the real goal before execution
  • reducing intent noise when the user does not know the domain well enough to define the right target
  • turning vague intent into a concrete, evidence-backed contract
  • tracking which evidence still supports the current plan
  • tolerating side questions or late constraints without losing the mainline
  • avoiding premature stops when Codex already knows the next goal-covered action
  • encouraging real domain exploration before simply validating an answer Codex already expects

LLR experiments with this through two modes:

  • INC mode: Intent Noise Cancellation. Codex explores the repo/domain, surfaces assumptions, identifies risks, discovers expert framing, proposes hard acceptance criteria, and builds an evidence-backed contract before implementation.
  • ACTIVE mode: the user explicitly authorizes Codex to pursue the confirmed contract as the mainline. Codex should continue through clear contract-covered next steps instead of stopping after every useful local update.

The feature I would like to see is not necessarily LLR copied into Codex directly. Rather, I would like /goal to support similar long-run harness semantics, either as built-in behavior or as extension points for skills/plugins.

A possible model:

  1. Add a calibration phase before execution

Before a goal becomes an active execution target, Codex could have an explicit calibration mode similar to LLR's INC mode.

In this phase, Codex would:

  • explore the repo, task context, and domain
  • clarify the user's real objective
  • infer hidden requirements
  • surface assumptions and risks
  • discover current expert framing when the domain matters
  • propose hard acceptance criteria
  • ask the user to confirm the contract before execution

This is useful because users often cannot describe the correct goal until Codex has helped explore the space.

  1. Track current effective evidence

A goal could maintain lightweight current evidence, not just checkpoints.

Possible fields could include:

  • user signals
  • verified facts
  • assumptions
  • risks
  • open decisions
  • next action
  • completion signal

The important distinction is:

  • checkpoint = what happened
  • evidence chain = what still matters and why it should guide the next action

LLR's evidence-chain design has been useful because long-running tasks often produce many artifacts, but later review needs to know which facts still support the mainline and which earlier assumptions were overturned.

  1. Add side-thread tolerance

During an active goal, users often remember missing constraints or ask urgent side questions.

A goal workflow could help Codex distinguish whether the latest user message is:

  • a side question
  • a missing constraint
  • a blocker
  • a contract change
  • a stop request
  • a return-to-calibration request

For normal side questions, Codex should answer the user first, then resume the active goal if the main contract is unchanged.

This is one of the most useful parts of LLR in practice: the mainline keeps a compass even when the conversation temporarily walks onto a side path.

  1. Add a premature-stop check

If Codex is about to stop while a goal is active, it could re-evaluate:

  • Is the objective complete?
  • Is the work blocked?
  • Did the user ask to stop?
  • Has the contract changed enough to require recalibration?
  • Is there still a clear next action covered by the goal?

If the next action is clear and still covered by the active goal, Codex should continue instead of asking the user to type "continue".

This is the motivation behind LLR's hook-based stop guard. It is not meant to make Codex run forever. It is meant to prevent the common pattern where Codex states the next step itself, but still stops and waits for the user to approve continuation.

  1. Encourage exploration before validation

In research-heavy or fast-moving domains, Codex search can sometimes behave like answer validation: it searches for high-confidence concepts it already knows instead of discovering the domain from task-level keywords.

LLR's INC guidance asks Codex to derive discovery keywords from the user's wording, project vocabulary, file names, data labels, metrics, failure symptoms, tools, quality bar, and ecosystem terms before presenting expert defaults.

This could also be useful around /goal, especially when a goal depends on current practice, benchmarks, standards, libraries, or domain conventions.

Expected outcome

I would love to know whether the Codex team sees these LLR patterns as:

  • behaviors that could eventually belong inside /goal
  • behaviors better handled by skills/plugins
  • extension points that /goal could expose
  • or simply a userland experiment that should remain outside core Codex

Thanks for adding persisted /goal workflows. LLR is my attempt to explore the surrounding harness that makes long-running goals easier to define, steer, review, and complete.

Additional information

No response

extent analysis

TL;DR

The user suggests enhancing the /goal feature in Codex with a long-run harness layer, inspired by the Long Long Run (LLR) skill, to improve handling of long-running tasks and user intent.

Guidance

  • Review the LLR repository and documentation to understand the proposed features and their implementation.
  • Consider adding a calibration phase before execution, as described in the LLR model, to clarify user objectives and infer hidden requirements.
  • Explore the possibility of tracking current effective evidence, including user signals, verified facts, and assumptions, to guide next actions.
  • Evaluate the need for side-thread tolerance and premature-stop checks to improve the user experience and prevent unnecessary stops.

Example

No specific code example is provided, as the issue focuses on high-level feature suggestions rather than implementation details.

Notes

The proposed features are inspired by the LLR skill, which is a userland experiment. The Codex team may need to evaluate whether these features belong inside the core /goal functionality or should be handled by skills/plugins.

Recommendation

Apply a workaround by utilizing the LLR skill or similar plugins to achieve the desired long-run harness functionality, as the core /goal feature may not currently support these enhancements.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix Feature idea: extend /goal with intent calibration, evidence chains, and side-thread tolerance [2 comments, 3 participants]