codex - 💡(How to fix) Fix Feature request: clarify and support scalable remote execution patterns

Control / Session Layer - manages session lifecycle - receives user input - dispatches work - consumes execution events - persists session metadata/history Execution Worker - runs the Codex runtime - accesses the workspace - executes shell/fs/git/tools - streams events back to the control layer

Root Cause

A clearer remote execution model would make Codex easier to use in environments that need:

horizontal scaling of execution capacity
stronger isolation between sessions/workspaces
worker-local access to files, tools, and runtime resources
centralized session lifecycle and event handling
more reliable recovery or migration of non-active sessions
SDK-level integration without requiring each application to reverse-engineer app-server protocol details

Code Example

Control / Session Layer
  - manages session lifecycle
  - receives user input
  - dispatches work
  - consumes execution events
  - persists session metadata/history

Execution Worker
  - runs the Codex runtime
  - accesses the workspace
  - executes shell/fs/git/tools
  - streams events back to the control layer

---

Control / Session Service
  - stores session identity and history
  - receives user input
  - selects an execution worker
  - forwards work to that worker
  - records streamed events back into the session store

Execution Worker
  - runs the agent process/runtime
  - operates on the workspace
  - executes tools and shell commands
  - streams structured events back to the control layer

Summary

Codex already has several useful building blocks for remote execution, such as codex app-server --listen ws://..., the exec-server / EnvironmentManager path, and experimental APIs such as thread/resume.history and rawResponseItem/completed.

However, these pieces do not yet form a clear, first-class remote execution worker pattern. For applications that need to run many concurrent Codex sessions, it would be valuable to have a documented and stable way to separate the control/session layer from the execution/runtime layer.

A possible conceptual split is:

Control / Session Layer
  - manages session lifecycle
  - receives user input
  - dispatches work
  - consumes execution events
  - persists session metadata/history

Execution Worker
  - runs the Codex runtime
  - accesses the workspace
  - executes shell/fs/git/tools
  - streams events back to the control layer

The goal is not necessarily to prescribe a specific deployment model, but to make Codex easier to embed in scalable multi-session environments where long-running agent runtimes should live closer to the execution environment.

Related example

A useful comparison is the remote-execution style used by Claude Code CCR-like deployments. In that model, the control layer can own a sessionStore while execution workers own the actual agent runtime and workspace access.

Conceptually, this gives a closed loop:

Control / Session Service
  - stores session identity and history
  - receives user input
  - selects an execution worker
  - forwards work to that worker
  - records streamed events back into the session store

Execution Worker
  - runs the agent process/runtime
  - operates on the workspace
  - executes tools and shell commands
  - streams structured events back to the control layer

The important part is that the durable session state belongs to the control layer, while the expensive and environment-specific runtime belongs to the worker. With a session store in the middle, a worker does not need to be the durable source of truth for the session. It only needs enough protocol support to start, resume, run, interrupt, and stream events for a session supplied by the control layer.

This makes the architecture easier to reason about: the control layer owns session continuity, while workers provide replaceable execution capacity.

Current state

From the current APIs, there are several related capabilities:

codex app-server --listen ws://... allows a remote client to control a Codex app-server over WebSocket.
exec-server / EnvironmentManager can move process/fs/http execution to a remote environment.
thread/resume.history can rehydrate a thread from caller-provided ResponseItem history.
experimentalRawEvents / rawResponseItem/completed can let an external caller observe raw response items and build its own history store.
The TypeScript SDK currently wraps codex exec --experimental-json by spawning the CLI locally.

These are useful pieces, but the boundaries are still somewhat unclear for users trying to build a durable remote execution architecture.

Gaps

The main gaps are:

app-server --listen is a remote app-server mode, not a dedicated execution-worker mode. The remote process still owns the full app-server/runtime lifecycle.
exec-server remote execution only moves lower-level process/fs/http operations. The Codex agent runtime still lives above it.
thread/resume.history is experimental and appears closer to rehydrating/forking from model-visible history than to a stable distributed session restore contract.
There is no documented worker lifecycle contract, such as registration, heartbeat, capacity, lease, drain, start/resume turn, interrupt, and shutdown semantics.
There is no stable session snapshot/restore contract that an external control layer can rely on without combining multiple experimental APIs.
The TypeScript SDK does not currently expose a remote app-server or worker protocol client. It is mainly a local CLI-spawning wrapper.

Why this matters

A clearer remote execution model would make Codex easier to use in environments that need:

horizontal scaling of execution capacity
stronger isolation between sessions/workspaces
worker-local access to files, tools, and runtime resources
centralized session lifecycle and event handling
more reliable recovery or migration of non-active sessions
SDK-level integration without requiring each application to reverse-engineer app-server protocol details

Suggestions

A few possible improvements that would make this area much clearer:

Document the intended use cases and limitations of the existing remote-related modes:
- app-server WebSocket mode
- exec-server / EnvironmentManager
- thread/resume.history
- raw response item events
Define a first-class execution worker contract, even if the initial implementation is minimal.
Stabilize a session snapshot / restore API that can be used by an external control layer without relying on experimental fields.
Clarify the relationship between durable session/thread identity and runtime-local thread identity when restoring from history.
Provide TypeScript/Python SDK support for connecting to a remote app-server or worker endpoint, not only spawning local CLI commands.

This would make Codex much easier to integrate into scalable remote execution systems while preserving the existing app-server and exec-server modes for simpler deployments.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

codex - 💡(How to fix) Fix Feature request: clarify and support scalable remote execution patterns

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround