codex - 💡(How to fix) Fix Inefficient context compression [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#18318Fetched 2026-04-18 05:56:06
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
0
Author
Timeline (top)
labeled ×3commented ×2closed ×1cross-referenced ×1

Root Cause

For coding tasks, the most valuable context is often not the broad summary of the task, but the accumulated local engineering knowledge, for example:

  • “this is the exact file to patch”
  • “this existing module is the template to mirror”
  • “do not widen scope beyond this boundary”
  • “these tests are sufficient; the others are unnecessary”
  • “this path was already checked and rejected”

Losing that information is costly because the model must pay for it again.

So the current behavior can make compression feel less like optimization and more like a tax on amnesia: the agent remembers what it wanted to do, but forgets what it had already learned.


Fix Action

Fix / Workaround

  • “this is the exact file to patch”
  • “this existing module is the template to mirror”
  • “do not widen scope beyond this boundary”
  • “these tests are sufficient; the others are unnecessary”
  • “this path was already checked and rejected”

3. Frontier state instead of only narrative state

Compression should preserve something like:

  • confirmed facts
  • touched files
  • current patch point
  • blockers
  • next atomic action

Bias compression toward implementation continuity

Once the agent has entered implementation mode, compression should favor:

  • patch continuity
  • exact symbol/path continuity
  • test continuity
RAW_BUFFERClick to expand / collapse

What variant of Codex are you using?

App

What feature would you like to see?

Context compression often preserves intent, but drops execution-critical state

First of all, context compression is genuinely valuable. In long-running tasks it can help the agent stay on track and continue working without immediately hitting context limits.

However, in practice I keep running into one recurring failure mode:

after compression, the agent often re-collects the same context it had already gathered before compression, which consumes most of the tokens that compression was supposed to save.

This is especially noticeable in large repositories with complex local dependencies, where implementation depends not just on task intent, but on many small, fragile, already-verified details.


The core issue

From my experience, the problem is not that compression exists, but that it often compresses the wrong layer of context.

It tends to preserve:

  • the general task intent
  • the high-level plan
  • the narrative of what the agent was trying to do

But it often drops what is actually needed to finish implementation:

  • exact integration points
  • already-checked files and why they were checked
  • rejected hypotheses / dead ends
  • narrow scope boundaries
  • specific symbols, paths, checkpoints, test targets
  • “what is already known to be sufficient”

As a result, the agent still “remembers” the goal, but loses the working set required to execute it efficiently.


What this looks like in practice

A typical pattern looks like this:

  1. The agent starts narrow and gathers the relevant implementation context.
  2. It identifies the likely integration surface and neighboring owner-surfaces.
  3. It begins implementation.
  4. Context compression triggers.
  5. After compression, the agent still remembers the intent, but loses the execution frontier.
  6. It starts re-reading the same seams, maps, ADRs, tests, topology files, telemetry files, etc.
  7. Token usage spikes again, and the effective gain from compression becomes very small.

So the result is something like:

  • compression saves some space temporarily
  • but the lost execution context has to be rebuilt
  • which burns a large part of the saved budget

In effect, compression sometimes behaves like:

  • ~20% useful retention for implementation
  • while forcing the agent to spend ~80% of the saved budget rebuilding context

Those numbers are approximate, not benchmarked, but they match the practical experience very closely.


Concrete example from a real run

In one repo task, the agent explicitly said it would proceed narrowly:

  • read only the seam docs
  • inspect the existing neighboring pattern
  • add a minimal package
  • wire a narrow integration path
  • run only targeted tests

But after compression, instead of continuing from the implementation frontier, it started re-reading a large portion of the same context again:

  • seam docs
  • contour maps
  • ADRs
  • neighboring module files
  • topology policy
  • runtime trace
  • subject tick models / policy / telemetry
  • multiple owner tests and testkits

So although the agent had already done the expensive discovery work, compression removed enough of the execution-relevant memory that it had to repeat discovery almost from scratch.

That is exactly the opposite of what compression should optimize for.


Why this matters

For coding tasks, the most valuable context is often not the broad summary of the task, but the accumulated local engineering knowledge, for example:

  • “this is the exact file to patch”
  • “this existing module is the template to mirror”
  • “do not widen scope beyond this boundary”
  • “these tests are sufficient; the others are unnecessary”
  • “this path was already checked and rejected”

Losing that information is costly because the model must pay for it again.

So the current behavior can make compression feel less like optimization and more like a tax on amnesia: the agent remembers what it wanted to do, but forgets what it had already learned.


Expected behavior

Ideally, context compression should prioritize preserving:

1. Execution-critical state

  • exact files already read
  • exact integration points
  • concrete next step
  • minimal required test set
  • rejected alternatives
  • scope constraints

2. Negative knowledge

A lot of token savings come from remembering what does not need to be revisited:

  • files already ruled out
  • hypotheses already disproven
  • docs already confirmed as unnecessary to reread

3. Frontier state instead of only narrative state

Compression should preserve something like:

  • confirmed facts
  • touched files
  • current patch point
  • blockers
  • next atomic action

rather than mostly preserving a high-level “story” of the task.


Suggested improvements

A few possible directions that might help:

Preserve a structured “working frontier”

Instead of only summarizing intent, preserve a compact execution state such as:

  • confirmed facts
  • rejected hypotheses
  • touched files
  • next atomic step
  • remaining minimal tests
  • forbidden scope expansions

Prefer trimming over flattening

Rather than compressing the entire context aggressively, it may be better to:

  • trim repetitive narrative
  • trim redundant tool chatter
  • preserve implementation-relevant local facts

Preserve negative knowledge explicitly

Remembering what the agent already ruled out can save as many tokens as remembering what it found.

Bias compression toward implementation continuity

Once the agent has entered implementation mode, compression should favor:

  • patch continuity
  • exact symbol/path continuity
  • test continuity

over broad semantic summaries.


In short

The current compression is very good at preserving intent, but coding tasks often require preserving execution memory.

When compression removes execution-critical state, the agent often re-generates the context it had already paid for, which greatly reduces the practical benefit of compression in large, interconnected codebases.

So the issue is not:

  • “compression exists”

The issue is:

  • compression frequently preserves the wrong information for implementation-heavy tasks

If this part improves, context compression could become much more effective in real repository work.

Additional information

No response

extent analysis

TL;DR

Preserve execution-critical state and negative knowledge during context compression to improve its effectiveness in implementation-heavy tasks.

Guidance

  • Identify and prioritize the preservation of execution-critical state, such as exact files already read, integration points, and concrete next steps, during context compression.
  • Consider preserving negative knowledge, like files already ruled out and hypotheses already disproven, to reduce token usage.
  • Implement a structured "working frontier" to preserve a compact execution state, including confirmed facts, touched files, and next atomic steps.
  • Trim repetitive narrative and redundant tool chatter instead of aggressively compressing the entire context.
  • Bias compression toward implementation continuity, favoring patch continuity, exact symbol/path continuity, and test continuity.

Example

// Example of preserving execution-critical state
PreservedContext = {
  "confirmedFacts": ["file1", "file2"],
  "touchedFiles": ["file3", "file4"],
  "nextAtomicStep": "implementFeatureX",
  "remainingMinimalTests": ["test1", "test2"]
}

Notes

The provided solution focuses on preserving execution-critical state and negative knowledge during context compression. However, the actual implementation may vary depending on the specific requirements and constraints of the system.

Recommendation

Apply a workaround by preserving execution-critical state and negative knowledge during context compression, as this approach is likely to improve the effectiveness of compression in implementation-heavy tasks.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Ideally, context compression should prioritize preserving:

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING