codex - ✅(Solved) Fix detail: "original" token/byte estimates are currently unbounded [1 pull requests, 2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#19806Fetched 2026-04-28 06:36:47
View on GitHub
Comments
2
Participants
1
Timeline
14
Reactions
0
Participants
Timeline (top)
labeled ×4commented ×2mentioned ×2subscribed ×2

Codex locally estimates detail: "original" image token usage by counting raw 32px patches across the decoded source image, with no upper bound for the image sizing caps that GPT-5.4/GPT-5.5 apply server-side.

In the Computer Use guide, the documentation states that detail: "original preserves the full screenshot resolution of up to 10.24 megapixels before images are downscaled.

That can hugely overestimate context usage for large images and materially affect local behavior such as auto-compaction and remote compaction trimming. Uncapped estimates, in practice, can lead to endless compaction loops when the original image bytes estimate significantly diverge from the true token cost of consuming that image.

Root Cause

Because the later token estimate is ceil(bytes / 4), the original image estimate is effectively one token per raw 32px patch. However, the local estimator not cap the token estimate. This is non consistent with how tokens are counted on the server.

Fix Action

Fix / Workaround

Codex locally estimates detail: "original" image token usage by counting raw 32px patches across the decoded source image, with no upper bound for the image sizing caps that GPT-5.4/GPT-5.5 apply server-side.

ceil(width / 32) * ceil(height / 32) patches
approx_bytes_for_tokens(patch_count) == patch_count * 4 bytes

PR fix notes

PR #19865: Cap original-detail image token estimates

Description (problem / solution / changelog)

Clamp original-detail image patch estimates to the current 10k patch budget so large images cannot inflate local context accounting without bound. Add regression coverage for an over-budget image.

Fixes openai/codex#19806.

Changed files

  • codex-rs/core/src/context_manager/history.rs (modified, +5/-0)
  • codex-rs/core/src/context_manager/history_tests.rs (modified, +33/-0)

Code Example

ceil(width / 32) * ceil(height / 32) patches

---

approx_bytes_for_tokens(patch_count) == patch_count * 4 bytes
RAW_BUFFERClick to expand / collapse

Summary

Codex locally estimates detail: "original" image token usage by counting raw 32px patches across the decoded source image, with no upper bound for the image sizing caps that GPT-5.4/GPT-5.5 apply server-side.

In the Computer Use guide, the documentation states that detail: "original preserves the full screenshot resolution of up to 10.24 megapixels before images are downscaled.

That can hugely overestimate context usage for large images and materially affect local behavior such as auto-compaction and remote compaction trimming. Uncapped estimates, in practice, can lead to endless compaction loops when the original image bytes estimate significantly diverge from the true token cost of consuming that image.

Research

The estimator in codex-rs/core/src/context_manager/history.rs replaces inline base64 image payload bytes with a model-visible byte estimate:

  • non-original images use RESIZED_IMAGE_BYTES_ESTIMATE = 7373
  • original-detail images call estimate_original_image_bytes
  • estimate_original_image_bytes decodes the image, reads dimensions, computes:
ceil(width / 32) * ceil(height / 32) patches

and returns:

approx_bytes_for_tokens(patch_count) == patch_count * 4 bytes

Because the later token estimate is ceil(bytes / 4), the original image estimate is effectively one token per raw 32px patch. However, the local estimator not cap the token estimate. This is non consistent with how tokens are counted on the server.

Example local estimates:

ImageLocal estimated image bytesLocal estimated image tokens
6000x6000141,37635,344
10000x10000391,87697,969
12000x12000562,500140,625
20000x200001,562,500390,625

OpenAI's image sizing docs for GPT-5.4/GPT-5.5 say original allows up to 10,000 patches or a 6000px max dimension, and images over either limit are resized while preserving aspect ratio (https://developers.openai.com/api/docs/guides/images-vision#model-sizing-behavior).

However, the documentation does not mention what the token multiplier for the model is, so I ran a bunch of API requests on a 10.24 megapixel image at various aspect ratios to check the most usage an image can incur. The numbers are the server-reported token cost of the vision request.

<img width="1221" height="417" alt="Image" src="https://github.com/user-attachments/assets/638b4831-140f-4bc1-928e-d9ca0f08fcfd" />

According to the table, detail: "original" vision payloads consume, at maximum, 12,000 tokens (which implies a 1.2x token multiplier on 10,000 patches).

Relevant code:

codex-rs/core/src/context_manager/history.rs

  • RESIZED_IMAGE_BYTES_ESTIMATE
  • ORIGINAL_IMAGE_PATCH_SIZE
  • estimate_original_image_bytes
  • image_data_url_estimate_adjustment

codex-rs/utils/string/src/truncate.rs

  • APPROX_BYTES_PER_TOKEN = 4
  • approx_token_count
  • approx_bytes_for_tokens
  • approx_tokens_from_byte_count

Impact

  • ContextManager::get_total_token_usage adds estimated tokens for items after the last model-generated item.
  • session/turn.rs uses total token usage to decide pre-turn and mid-turn auto-compaction.
  • compact_remote.rs trims function-call history while estimate_token_count_with_base_instructions exceeds the model context window.
  • recompute_token_usage can synthesize token-count events from the estimate.

So one or multiple large, original-detail images can make Codex believe the session is larger than the API actually sees, causing premature compaction or trimming.

Replication

Generate a 6000x6000 pixel image. Provide the path to that image and tell codex: "Please use the view image tool 10 times on the same image, at original resolution". Codex will go into an endless compaction loop because the local byte estimates far exceed the compaction threshold.

Expected behavior

detail: "original" image estimation should be capped at 12,000 tokens (~48,000 bytes)

extent analysis

TL;DR

The local image token estimation for detail: "original" images should be capped at 12,000 tokens to prevent overestimation and endless compaction loops.

Guidance

  • Review the estimate_original_image_bytes function in codex-rs/core/src/context_manager/history.rs to understand how the local estimator calculates the token estimate for original images.
  • Consider adding a cap to the token estimate based on the server-side limit of 10,000 patches or a 6000px max dimension.
  • Update the estimate_original_image_bytes function to return a maximum of 12,000 tokens (or approximately 48,000 bytes) for original images.
  • Verify that the updated estimator prevents endless compaction loops by testing with large images and checking the token usage estimates.

Example

No code example is provided as the issue is more related to the logic of the estimator rather than a specific code snippet.

Notes

The token multiplier for the model is not explicitly mentioned in the documentation, but based on the research, it appears to be 1.2x. This value may need to be adjusted if the actual token multiplier is different.

Recommendation

Apply a workaround by capping the local image token estimation for detail: "original" images at 12,000 tokens to prevent overestimation and endless compaction loops. This will help prevent premature compaction or trimming of the session history.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

detail: "original" image estimation should be capped at 12,000 tokens (~48,000 bytes)

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - ✅(Solved) Fix detail: "original" token/byte estimates are currently unbounded [1 pull requests, 2 comments, 1 participants]