codex - 💡(How to fix) Fix Feature request: clarify and support scalable remote execution patterns

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Codex already has several useful building blocks for remote execution, such as codex app-server --listen ws://..., the exec-server / EnvironmentManager path, and experimental APIs such as thread/resume.history and rawResponseItem/completed.

However, these pieces do not yet form a clear, first-class remote execution worker pattern. For applications that need to run many concurrent Codex sessions, it would be valuable to have a documented and stable way to separate the control/session layer from the execution/runtime layer.

A possible conceptual split is:

Control / Session Layer
  - manages session lifecycle
  - receives user input
  - dispatches work
  - consumes execution events
  - persists session metadata/history

Execution Worker
  - runs the Codex runtime
  - accesses the workspace
  - executes shell/fs/git/tools
  - streams events back to the control layer

The goal is not necessarily to prescribe a specific deployment model, but to make Codex easier to embed in scalable multi-session environments where long-running agent runtimes should live closer to the execution environment.

Root Cause

A clearer remote execution model would make Codex easier to use in environments that need:

  • horizontal scaling of execution capacity
  • stronger isolation between sessions/workspaces
  • worker-local access to files, tools, and runtime resources
  • centralized session lifecycle and event handling
  • more reliable recovery or migration of non-active sessions
  • SDK-level integration without requiring each application to reverse-engineer app-server protocol details

Fix Action

Fix / Workaround

Control / Session Layer
  - manages session lifecycle
  - receives user input
  - dispatches work
  - consumes execution events
  - persists session metadata/history

Code Example

Control / Session Layer
  - manages session lifecycle
  - receives user input
  - dispatches work
  - consumes execution events
  - persists session metadata/history

Execution Worker
  - runs the Codex runtime
  - accesses the workspace
  - executes shell/fs/git/tools
  - streams events back to the control layer

---

Control / Session Service
  - stores session identity and history
  - receives user input
  - selects an execution worker
  - forwards work to that worker
  - records streamed events back into the session store

Execution Worker
  - runs the agent process/runtime
  - operates on the workspace
  - executes tools and shell commands
  - streams structured events back to the control layer
RAW_BUFFERClick to expand / collapse

Summary

Codex already has several useful building blocks for remote execution, such as codex app-server --listen ws://..., the exec-server / EnvironmentManager path, and experimental APIs such as thread/resume.history and rawResponseItem/completed.

However, these pieces do not yet form a clear, first-class remote execution worker pattern. For applications that need to run many concurrent Codex sessions, it would be valuable to have a documented and stable way to separate the control/session layer from the execution/runtime layer.

A possible conceptual split is:

Control / Session Layer
  - manages session lifecycle
  - receives user input
  - dispatches work
  - consumes execution events
  - persists session metadata/history

Execution Worker
  - runs the Codex runtime
  - accesses the workspace
  - executes shell/fs/git/tools
  - streams events back to the control layer

The goal is not necessarily to prescribe a specific deployment model, but to make Codex easier to embed in scalable multi-session environments where long-running agent runtimes should live closer to the execution environment.

Related example

A useful comparison is the remote-execution style used by Claude Code CCR-like deployments. In that model, the control layer can own a sessionStore while execution workers own the actual agent runtime and workspace access.

Conceptually, this gives a closed loop:

Control / Session Service
  - stores session identity and history
  - receives user input
  - selects an execution worker
  - forwards work to that worker
  - records streamed events back into the session store

Execution Worker
  - runs the agent process/runtime
  - operates on the workspace
  - executes tools and shell commands
  - streams structured events back to the control layer

The important part is that the durable session state belongs to the control layer, while the expensive and environment-specific runtime belongs to the worker. With a session store in the middle, a worker does not need to be the durable source of truth for the session. It only needs enough protocol support to start, resume, run, interrupt, and stream events for a session supplied by the control layer.

This makes the architecture easier to reason about: the control layer owns session continuity, while workers provide replaceable execution capacity.

Current state

From the current APIs, there are several related capabilities:

  • codex app-server --listen ws://... allows a remote client to control a Codex app-server over WebSocket.
  • exec-server / EnvironmentManager can move process/fs/http execution to a remote environment.
  • thread/resume.history can rehydrate a thread from caller-provided ResponseItem history.
  • experimentalRawEvents / rawResponseItem/completed can let an external caller observe raw response items and build its own history store.
  • The TypeScript SDK currently wraps codex exec --experimental-json by spawning the CLI locally.

These are useful pieces, but the boundaries are still somewhat unclear for users trying to build a durable remote execution architecture.

Gaps

The main gaps are:

  1. app-server --listen is a remote app-server mode, not a dedicated execution-worker mode. The remote process still owns the full app-server/runtime lifecycle.

  2. exec-server remote execution only moves lower-level process/fs/http operations. The Codex agent runtime still lives above it.

  3. thread/resume.history is experimental and appears closer to rehydrating/forking from model-visible history than to a stable distributed session restore contract.

  4. There is no documented worker lifecycle contract, such as registration, heartbeat, capacity, lease, drain, start/resume turn, interrupt, and shutdown semantics.

  5. There is no stable session snapshot/restore contract that an external control layer can rely on without combining multiple experimental APIs.

  6. The TypeScript SDK does not currently expose a remote app-server or worker protocol client. It is mainly a local CLI-spawning wrapper.

Why this matters

A clearer remote execution model would make Codex easier to use in environments that need:

  • horizontal scaling of execution capacity
  • stronger isolation between sessions/workspaces
  • worker-local access to files, tools, and runtime resources
  • centralized session lifecycle and event handling
  • more reliable recovery or migration of non-active sessions
  • SDK-level integration without requiring each application to reverse-engineer app-server protocol details

Suggestions

A few possible improvements that would make this area much clearer:

  1. Document the intended use cases and limitations of the existing remote-related modes:

    • app-server WebSocket mode
    • exec-server / EnvironmentManager
    • thread/resume.history
    • raw response item events
  2. Define a first-class execution worker contract, even if the initial implementation is minimal.

  3. Stabilize a session snapshot / restore API that can be used by an external control layer without relying on experimental fields.

  4. Clarify the relationship between durable session/thread identity and runtime-local thread identity when restoring from history.

  5. Provide TypeScript/Python SDK support for connecting to a remote app-server or worker endpoint, not only spawning local CLI commands.

This would make Codex much easier to integrate into scalable remote execution systems while preserving the existing app-server and exec-server modes for simpler deployments.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix Feature request: clarify and support scalable remote execution patterns