codex - 💡(How to fix) Fix [Persistent multi-version bug for VS Code SSH Linux] enters overload loop and collapse after threads stuck and recovery [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#20029Fetched 2026-04-29 06:23:37
View on GitHub
Comments
3
Participants
2
Timeline
15
Reactions
0
Timeline (top)
labeled ×4commented ×3mentioned ×3subscribed ×3

Error Message

Server overloaded; try later

However, the evidence suggests that this is a local Codex app-server / transport / old-thread recovery problem rather than a normal backend overload.

Key observations:

codex login status: Logged in using ChatGPT exit code: 0

But a simple CLI smoke test times out:

codex exec smoke test: failed to refresh available models: timeout waiting for child process to exit

The Codex logs contain repeated local overload and thread recovery signals:

outbound queue is full dropping overload response thread/resume thread-read-state server overloaded PendingMigrationError Codex app-server process exited unexpectedly signal SIGTERM Codex process is not available disconnected

Recent keyword counts from the logs:

outbound queue is full: 12464 dropping overload response: 12464 server overloaded: 22047 PendingMigrationError: 225 thread/resume: 121 thread-read-state: 108 not-connected: 14

The app-server eventually exits unexpectedly:

Codex app-server process exited unexpectedly signal SIGTERM

After that, the extension reports that the Codex process is unavailable or disconnected.

I also saw many 401 and 403 entries in the logs, but these appear to be secondary connector or network noise rather than the primary failure, because ChatGPT login status itself succeeds.

The most reliable temporary workaround is to rename or remove the active .codex directory, let Codex create a fresh one, and then selectively copy back only minimal files such as config.toml and auth.json. Copying old session data back can retrigger the failure.

This strongly suggests a local state / session hydration / app-server backpressure issue rather than a pure account, login, or model availability issue.

What steps can reproduce the bug?

The exact failure is easiest to reproduce in my existing VS Code Remote-SSH HPC environment with an existing .codex state containing old sessions.

Approximate reproduction pattern:

  1. Connect to a remote Linux HPC server using VS Code Remote-SSH.
  2. Use the Codex VS Code extension on the remote side.
  3. Use an existing active .codex directory that contains old sessions from previous Codex extension versions.
  4. Open the Codex panel in VS Code.
  5. Let the extension attempt to list, read, or resume old threads.
  6. One or more old large threads appear to trigger repeated thread/resume or thread-read-state activity.
  7. Logs begin showing repeated local transport overload messages:

Root Cause

I also saw many 401 and 403 entries in the logs, but these appear to be secondary connector or network noise rather than the primary failure, because ChatGPT login status itself succeeds.

Fix Action

Fix / Workaround

The most reliable temporary workaround is to rename or remove the active .codex directory, let Codex create a fresh one, and then selectively copy back only minimal files such as config.toml and auth.json. Copying old session data back can retrigger the failure.

Generic workaround that temporarily restores usability:

Important: this is only a workaround. It makes Codex temporarily usable again, but old threads remain dangerous to open or resume.

Code Example

Server overloaded; try later

However, the evidence suggests that this is a local Codex app-server / transport / old-thread recovery problem rather than a normal backend overload.

Key observations:

codex login status:
Logged in using ChatGPT
exit code: 0

But a simple CLI smoke test times out:

codex exec smoke test:
failed to refresh available models: timeout waiting for child process to exit

The Codex logs contain repeated local overload and thread recovery signals:

outbound queue is full
dropping overload response
thread/resume
thread-read-state
server overloaded
PendingMigrationError
Codex app-server process exited unexpectedly
signal SIGTERM
Codex process is not available
disconnected

Recent keyword counts from the logs:

outbound queue is full: 12464
dropping overload response: 12464
server overloaded: 22047
PendingMigrationError: 225
thread/resume: 121
thread-read-state: 108
not-connected: 14

The app-server eventually exits unexpectedly:

Codex app-server process exited unexpectedly
signal SIGTERM

After that, the extension reports that the Codex process is unavailable or disconnected.

I also saw many 401 and 403 entries in the logs, but these appear to be secondary connector or network noise rather than the primary failure, because ChatGPT login status itself succeeds.

The most reliable temporary workaround is to rename or remove the active .codex directory, let Codex create a fresh one, and then selectively copy back only minimal files such as config.toml and auth.json. Copying old session data back can retrigger the failure.

This strongly suggests a local state / session hydration / app-server backpressure issue rather than a pure account, login, or model availability issue.

### What steps can reproduce the bug?

The exact failure is easiest to reproduce in my existing VS Code Remote-SSH HPC environment with an existing `.codex` state containing old sessions.

Approximate reproduction pattern:

1. Connect to a remote Linux HPC server using VS Code Remote-SSH.
2. Use the Codex VS Code extension on the remote side.
3. Use an existing active `.codex` directory that contains old sessions from previous Codex extension versions.
4. Open the Codex panel in VS Code.
5. Let the extension attempt to list, read, or resume old threads.
6. One or more old large threads appear to trigger repeated `thread/resume` or `thread-read-state` activity.
7. Logs begin showing repeated local transport overload messages:
RAW_BUFFERClick to expand / collapse

What version of the IDE extension are you using?

Codex VS Code Remote-SSH enters local app-server / transport overload loop after old thread recovery

What subscription do you have?

v26.422.62136 (Last known stable version: 26.406.40811-linux-x64)

Which IDE are you using?

GPT Pro

What platform is your computer?

Linux 4.18.0-372.9.1.el8.x86_64 x86_64 x86_64

What issue are you seeing?

What issue are you seeing? I am seeing a reproducible Codex VS Code extension failure in a VS Code Remote-SSH environment on an HPC Linux server.

This does not appear to be a simple authentication issue. codex login status succeeds with ChatGPT login, but the local Codex app-server / transport layer appears to enter a persistent overload state after attempting to read or resume old threads.

Once this happens, the Codex VS Code UI can no longer load existing conversations or start new ones. The UI repeatedly shows:

Server overloaded; try later

However, the evidence suggests that this is a local Codex app-server / transport / old-thread recovery problem rather than a normal backend overload.

Key observations:

codex login status:
Logged in using ChatGPT
exit code: 0

But a simple CLI smoke test times out:

codex exec smoke test:
failed to refresh available models: timeout waiting for child process to exit

The Codex logs contain repeated local overload and thread recovery signals:

outbound queue is full
dropping overload response
thread/resume
thread-read-state
server overloaded
PendingMigrationError
Codex app-server process exited unexpectedly
signal SIGTERM
Codex process is not available
disconnected

Recent keyword counts from the logs:

outbound queue is full: 12464
dropping overload response: 12464
server overloaded: 22047
PendingMigrationError: 225
thread/resume: 121
thread-read-state: 108
not-connected: 14

The app-server eventually exits unexpectedly:

Codex app-server process exited unexpectedly
signal SIGTERM

After that, the extension reports that the Codex process is unavailable or disconnected.

I also saw many 401 and 403 entries in the logs, but these appear to be secondary connector or network noise rather than the primary failure, because ChatGPT login status itself succeeds.

The most reliable temporary workaround is to rename or remove the active .codex directory, let Codex create a fresh one, and then selectively copy back only minimal files such as config.toml and auth.json. Copying old session data back can retrigger the failure.

This strongly suggests a local state / session hydration / app-server backpressure issue rather than a pure account, login, or model availability issue.

### What steps can reproduce the bug?

The exact failure is easiest to reproduce in my existing VS Code Remote-SSH HPC environment with an existing `.codex` state containing old sessions.

Approximate reproduction pattern:

1. Connect to a remote Linux HPC server using VS Code Remote-SSH.
2. Use the Codex VS Code extension on the remote side.
3. Use an existing active `.codex` directory that contains old sessions from previous Codex extension versions.
4. Open the Codex panel in VS Code.
5. Let the extension attempt to list, read, or resume old threads.
6. One or more old large threads appear to trigger repeated `thread/resume` or `thread-read-state` activity.
7. Logs begin showing repeated local transport overload messages:

```text
outbound queue is full
dropping overload response
thread/resume
server overloaded
The UI then repeatedly shows:
Server overloaded; try later
Eventually, existing conversations cannot be loaded and new conversations cannot be started.
The app-server exits unexpectedly with SIGTERM.
The extension reports that the Codex process is unavailable or disconnected.
The only reliable way to recover is to rename or remove the active .codex directory and let Codex create a fresh one.

Generic workaround that temporarily restores usability:

# Stop Codex-related processes first.
pkill -u "$USER" -f 'codex app-server' 2>/dev/null || true
pkill -u "$USER" -f 'openai.chatgpt-.*/codex' 2>/dev/null || true

# Freeze active Codex home.
mv "$HOME/.codex" "$HOME/.codex.frozen.$(date +%Y%m%d-%H%M%S)"

# Create a clean active Codex home.
mkdir -m 700 "$HOME/.codex"

Then I selectively copy only minimal files such as:

config.toml
auth.json

I avoid copying the entire old sessions, archived_sessions, SQLite state files, logs, or session index back into the active .codex directory.

Important: this is only a workaround. It makes Codex temporarily usable again, but old threads remain dangerous to open or resume.

### What is the expected behavior?

Codex should not allow one problematic old thread, local state migration issue, or failed thread recovery attempt to make the entire VS Code extension unusable.

Expected behavior:

1. A problematic old thread should be skipped, quarantined, or marked as failed.
2. Failed `thread/resume` or `thread-read-state` operations should not be retried until the local outbound queue is saturated.
3. The local app-server should apply backpressure, circuit-breaking, or failure isolation.
4. New threads should remain usable even if one old thread cannot be resumed.
5. The app-server should recover cleanly after a failed thread hydration attempt.
6. The UI should distinguish local app-server / transport overload from actual remote server overload.
7. The UI should not show only:

```text
Server overloaded; try later

when the evidence points to local Codex app-server / transport overload.

A more accurate error would be something like:

Local Codex app-server overloaded or disconnected

or:

Failed to resume old thread. Start in safe mode or quarantine this thread?

Useful recovery features would include:

A safe mode that starts Codex without loading or hydrating old sessions.
A way to disable automatic old-thread recovery on startup.
A supported command to rebuild local session indexes and state safely.
Automatic quarantine for threads that repeatedly fail thread/resume or thread-read-state.
A way to preserve new-thread functionality even when old session recovery fails.

### Additional information

Environment:
- VS Code with Remote-SSH
- Remote Linux HPC server
- Codex IDE extension installed on the remote side
- Authentication through ChatGPT login
- Model involved: gpt-5.5
- The issue is more likely to appear after multi-turn sessions, especially with fast/high or fast/xhigh reasoning.

Version history:

Last known stable version in my environment:
openai.chatgpt-26.406.40811-linux-x64

Affected versions:
Newer versions after 26.406.40811, including versions required for GPT-5.5 usage.

Why this looks like a local app-server / transport / session recovery bug:

codex login status succeeds.
Removing or renaming the local .codex state makes Codex usable again.
The failure is correlated with old thread read/resume attempts.
Logs show massive local transport queue saturation:
outbound queue is full
dropping overload response
The app-server exits with SIGTERM.
New threads fail only after the local app-server enters this overloaded state.
The UI message says server overloaded, but the evidence points to local app-server / transport queue overload.

Impact:

Existing long-running research/coding threads cannot be opened safely.
Once a problematic old thread triggers the overload loop, even new threads cannot be started.
The user has to manually freeze or surgically edit local Codex state to recover.
The issue recurs after upgrading to newer versions needed for GPT-5.5.
The error message is misleading because it suggests a generic remote server overload, while the evidence points to local app-server/session recovery overload.

Privacy note:

I am intentionally not attaching raw logs or full transcripts in the initial report because they may contain private paths, usernames, research project names, prompts, and local file names. I can provide redacted excerpts that include only app-server, transport, and thread recovery diagnostics if needed.

extent analysis

TL;DR

The most likely fix is to remove or rename the local .codex directory and let Codex create a fresh one, then selectively copy back minimal files, to resolve the local app-server/transport overload issue.

Guidance

  • The issue appears to be caused by a local state/session hydration/app-server backpressure problem, rather than a pure account, login, or model availability issue.
  • To verify, check the Codex logs for repeated local transport overload messages, such as "outbound queue is full" and "dropping overload response".
  • To mitigate, stop Codex-related processes, freeze the active Codex home, create a clean active Codex home, and selectively copy back minimal files like config.toml and auth.json.
  • Avoid copying old sessions, archived sessions, SQLite state files, logs, or session index back into the active .codex directory.

Example

No code snippet is provided as it is not necessary for this specific issue.

Notes

The provided workaround is temporary and may not resolve the underlying issue. The user may need to wait for an official fix or update from the Codex team.

Recommendation

Apply the workaround by removing or renaming the local .codex directory and letting Codex create a fresh one, then selectively copy back minimal files. This is because the issue is likely caused by a local state/session hydration/app-server backpressure problem, and this workaround has been shown to temporarily restore usability.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix [Persistent multi-version bug for VS Code SSH Linux] enters overload loop and collapse after threads stuck and recovery [3 comments, 2 participants]