codex - 💡(How to fix) Fix exec: surface existing MCP server startup notifications in the JSONL event stream [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#17501Fetched 2026-04-12 13:27:32
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
1
Author
Participants
Timeline (top)
labeled ×4referenced ×1unlabeled ×1

This is driven by maintaining codex-mcp-bridge (an open-source MCP server that wraps codex exec) and specifically by the introspection phase of its design. Without the change, each external consumer has to re-implement MCP probing against their own config-parsing rules, including process group cleanup, transport branching, OAuth token store reuse, and OS-specific lifecycle quirks. With the change, they parse the JSONL they already consume.

No equivalent structured per-server MCP lifecycle stream exists in other MCP hosts I could verify (Claude Code's /mcp and claude mcp list are plain-text output, @modelcontextprotocol/sdk exposes init only via await client.connect() and UnauthorizedError, spec-level lifecycle is handshake-based). Codex is well-positioned to set a pattern here.

Error Message

codex-rs/codex-mcp/src/mcp_connection_manager.rs (lines 747-814) emits per-server progress events | v EventMsg::McpStartupUpdate(McpStartupUpdateEvent { server, status }) status: McpStartupStatus { Starting | Ready | Failed { error } | Cancelled } (codex-rs/protocol/src/protocol.rs lines 1439, 3214, 3224) | v codex-rs/app-server/src/bespoke_event_handling.rs (lines 234-262) translates core event to protocol notification (API v2) | v ServerNotification::McpServerStatusUpdated wire name: "mcpServer/startupStatus/updated" payload: { name: String, status: McpServerStartupState, error: Option<String> } (codex-rs/app-server-protocol/src/protocol/v2.rs lines 5515-5532, registered in codex-rs/app-server-protocol/src/protocol/common.rs line 991) | +-> TUI: consumed by codex-rs/tui/src/chatwidget.rs (line 3063) | +-> exec JSONL: codex-rs/exec/src/event_processor_with_jsonl_output.rs line 581 _ => CodexStatus::Running catch-all, silently dropped

Root Cause

This is driven by maintaining codex-mcp-bridge (an open-source MCP server that wraps codex exec) and specifically by the introspection phase of its design. Without the change, each external consumer has to re-implement MCP probing against their own config-parsing rules, including process group cleanup, transport branching, OAuth token store reuse, and OS-specific lifecycle quirks. With the change, they parse the JSONL they already consume.

No equivalent structured per-server MCP lifecycle stream exists in other MCP hosts I could verify (Claude Code's /mcp and claude mcp list are plain-text output, @modelcontextprotocol/sdk exposes init only via await client.connect() and UnauthorizedError, spec-level lifecycle is handshake-based). Codex is well-positioned to set a pattern here.

Fix Action

Fix / Workaround

I have a draft patch applied against rust-v0.120.0 on a local feature branch feat/mcp-init-events-exec-json. Happy to open a PR if this proposal is directionally acceptable. Everything is in the codex-exec crate; no changes to codex-core, codex-mcp, or protocol crates.

Code Example

codex-rs/codex-mcp/src/mcp_connection_manager.rs (lines 747-814)
    emits per-server progress events
        |
        v
EventMsg::McpStartupUpdate(McpStartupUpdateEvent { server, status })
    status: McpStartupStatus { Starting | Ready | Failed { error } | Cancelled }
    (codex-rs/protocol/src/protocol.rs lines 1439, 3214, 3224)
        |
        v
codex-rs/app-server/src/bespoke_event_handling.rs (lines 234-262)
    translates core event to protocol notification (API v2)
        |
        v
ServerNotification::McpServerStatusUpdated
    wire name: "mcpServer/startupStatus/updated"
    payload: { name: String, status: McpServerStartupState, error: Option<String> }
    (codex-rs/app-server-protocol/src/protocol/v2.rs lines 5515-5532,
     registered in codex-rs/app-server-protocol/src/protocol/common.rs line 991)
        |
        +-> TUI: consumed by codex-rs/tui/src/chatwidget.rs (line 3063)
        |
        +-> exec JSONL: codex-rs/exec/src/event_processor_with_jsonl_output.rs line 581
                        `_ => CodexStatus::Running` catch-all, silently dropped

---

// in ThreadEvent enum
#[serde(rename = "mcp.server.init_started")]
McpServerInitStarted(McpServerInitStartedEvent),

#[serde(rename = "mcp.server.ready")]
McpServerReady(McpServerReadyEvent),

#[serde(rename = "mcp.server.failed")]
McpServerFailed(McpServerFailedEvent),

#[serde(rename = "mcp.server.cancelled")]
McpServerCancelled(McpServerCancelledEvent),

---

pub struct McpServerInitStartedEvent { pub name: String }

pub struct McpServerReadyEvent { pub name: String }

pub struct McpServerFailedEvent {
    pub name: String,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub error: Option<String>,
}

pub struct McpServerCancelledEvent {
    pub name: String,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub error: Option<String>,
}
RAW_BUFFERClick to expand / collapse

What feature would you like to see?

Emit the MCP server startup notifications that codex core already produces as first-class events in codex exec --json output, so that tools wrapping codex exec can observe per-server MCP lifecycle (starting, ready, failed, cancelled) without spawning and probing MCP servers themselves.

Why

External tools and agent frameworks that wrap codex exec --json currently have no way to tell:

  • which configured MCP servers actually started,
  • which failed (and why),
  • how long users wait on MCP boot at the start of a turn,
  • whether a server marked required (PR #10902) is the reason exec is about to fail fast.

They also cannot populate health dashboards or CI diagnostics without re-implementing the full MCP client path themselves (stdio process group management, docker cleanup, Windows taskkill /T, OAuth token store reuse, transport branching, env allowlists). That is tens of lines of hard-to-maintain lifecycle handling per host, all to rediscover data Codex already knows.

Concrete pain point: issue #17024 (local stdio MCP startup fails even though manual initialize succeeds). A user hitting that today gets no structured signal via exec --json. Codex knows the startup failed and the reason, but the JSONL stream stays silent. A bridge or dashboard cannot even display the failure message without screen-scraping human-readable output.

Closely related prior art:

  • PR #10902 (merged). Establishes the required flag for MCP servers and makes codex exec fail fast on required-server startup failure. That change proves maintainers treat MCP startup state as a first-class exec-layer concern. This proposal is the natural complement: let callers see why it failed, in machine-readable form.
  • #17024 (open). Handshake and startup bug where visibility matters; this proposal would give the reporter and other users a structured way to capture the failure reason.
  • #3778 (closed). RFC 0001 proposed a broad codex mcp test <name> [--json] subcommand as part of a larger overhaul. This proposal intentionally does NOT propose a new subcommand. It reuses the existing exec --json surface and the existing notifications that already flow through the app-server protocol layer.

Current state (verified in rust-v0.120.0)

The full pipeline from the MCP connection manager to the app-server protocol layer already exists and is already consumed by the TUI. codex exec --json is the one consumer that drops the notifications.

codex-rs/codex-mcp/src/mcp_connection_manager.rs (lines 747-814)
    emits per-server progress events
        |
        v
EventMsg::McpStartupUpdate(McpStartupUpdateEvent { server, status })
    status: McpStartupStatus { Starting | Ready | Failed { error } | Cancelled }
    (codex-rs/protocol/src/protocol.rs lines 1439, 3214, 3224)
        |
        v
codex-rs/app-server/src/bespoke_event_handling.rs (lines 234-262)
    translates core event to protocol notification (API v2)
        |
        v
ServerNotification::McpServerStatusUpdated
    wire name: "mcpServer/startupStatus/updated"
    payload: { name: String, status: McpServerStartupState, error: Option<String> }
    (codex-rs/app-server-protocol/src/protocol/v2.rs lines 5515-5532,
     registered in codex-rs/app-server-protocol/src/protocol/common.rs line 991)
        |
        +-> TUI: consumed by codex-rs/tui/src/chatwidget.rs (line 3063)
        |
        +-> exec JSONL: codex-rs/exec/src/event_processor_with_jsonl_output.rs line 581
                        `_ => CodexStatus::Running` catch-all, silently dropped

collect_thread_events in event_processor_with_jsonl_output.rs has no arm for ServerNotification::McpServerStatusUpdated. The notification falls through the _ => catch-all at the end of the match.

Proposal

Add four ThreadEvent variants (in codex-rs/exec/src/exec_events.rs) and handle the existing notification in collect_thread_events:

// in ThreadEvent enum
#[serde(rename = "mcp.server.init_started")]
McpServerInitStarted(McpServerInitStartedEvent),

#[serde(rename = "mcp.server.ready")]
McpServerReady(McpServerReadyEvent),

#[serde(rename = "mcp.server.failed")]
McpServerFailed(McpServerFailedEvent),

#[serde(rename = "mcp.server.cancelled")]
McpServerCancelled(McpServerCancelledEvent),

Payloads:

pub struct McpServerInitStartedEvent { pub name: String }

pub struct McpServerReadyEvent { pub name: String }

pub struct McpServerFailedEvent {
    pub name: String,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub error: Option<String>,
}

pub struct McpServerCancelledEvent {
    pub name: String,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub error: Option<String>,
}

Mapping from the existing McpServerStartupState enum:

  • Starting to mcp.server.init_started
  • Ready to mcp.server.ready
  • Failed to mcp.server.failed { name, error } (preserves notification.error passthrough)
  • Cancelled to mcp.server.cancelled { name, error } (preserves notification.error passthrough)

Cancelled is kept distinct from Failed to match how codex-rs/tui/src/chatwidget.rs:2932,3070 already treats the two separately. An interrupted startup is not the same as an initialization failure and consumers should be able to distinguish them.

Scope

Five files, +251 LOC, zero deletions, within a single crate (codex-rs/exec):

  • codex-rs/exec/src/exec_events.rs: new ThreadEvent variants and payload structs.
  • codex-rs/exec/src/event_processor_with_jsonl_output.rs: new arm in collect_thread_events before the _ => catch-all, plus imports.
  • codex-rs/exec/src/event_processor_with_jsonl_output_tests.rs: unit test covering all four state transitions.
  • codex-rs/exec/src/lib.rs: public re-exports of the four new payload structs (matches existing alphabetical re-export pattern for every other exec_events payload type).
  • codex-rs/exec/tests/event_processor_with_json_output.rs: integration test asserting both the enum mapping and the serialized JSON wire shape (type string values plus skip_serializing_if = "Option::is_none" behavior for the optional error field).

No changes to codex-core, codex-mcp, codex-protocol, codex-app-server, or codex-app-server-protocol. No new crate dependencies. No changes to any manually-maintained schema file.

Backwards compatibility

Purely additive. Existing codex exec --json consumers that match only on thread.started, turn.*, or item.* event types will see the new events as unknown type values and can safely ignore them. No existing event shape changes.

Alternatives considered

  • New codex mcp test [--json] subcommand (proposed in #3778 RFC 0001): adds a second surface that every external tool needs to integrate with separately. Reusing exec --json piggybacks on the stream they already consume.
  • Extend codex mcp list --json to include runtime state: possible but does not help callers that need live lifecycle updates during a turn (e.g. "a required server just failed, abort the turn cleanly").
  • Probe MCP servers directly from the external tool: what we have been doing. Requires replicating process group management, transport branching, OAuth token store reuse, and Windows and docker lifecycle handling. Fragile and duplicates Codex code.
  • Put timing (boot_ms) and tool listings (tools[]) in this first change: intentionally deferred. This proposal only surfaces data Codex already emits. A follow-up issue can propose adding boot_ms to McpServerStatusUpdatedNotification (cross-crate change, wire-type evolution); tool listings are already available via the existing mcpServerStatus/list RPC so consumers can call that after seeing ready.

Open questions

  1. Is codex exec --json considered a stable public API, or best-effort debug output? This proposal would benefit from an explicit stability statement in docs/exec.md for the new events.
  2. Should the events carry a timestamp field? Existing thread.*, turn.*, and item.* events do not, so omitting for consistency, but open to guidance.
  3. Should mcp.server.cancelled distinguish "user interrupted startup" from "startup deadline exceeded"? The current McpStartupStatus::Cancelled variant in core has no further discriminator; current draft preserves notification.error string passthrough.

Implementation

I have a draft patch applied against rust-v0.120.0 on a local feature branch feat/mcp-init-events-exec-json. Happy to open a PR if this proposal is directionally acceptable. Everything is in the codex-exec crate; no changes to codex-core, codex-mcp, or protocol crates.

Verification results:

  • cargo build -p codex-exec: clean
  • cargo test -p codex-exec: 40 lib + 62 integration, all passing (including new unit test and new integration test)
  • cargo clippy -p codex-exec --all-targets -- -D warnings: clean
  • cargo fmt -p codex-exec -- --check: clean

Context

This is driven by maintaining codex-mcp-bridge (an open-source MCP server that wraps codex exec) and specifically by the introspection phase of its design. Without the change, each external consumer has to re-implement MCP probing against their own config-parsing rules, including process group cleanup, transport branching, OAuth token store reuse, and OS-specific lifecycle quirks. With the change, they parse the JSONL they already consume.

No equivalent structured per-server MCP lifecycle stream exists in other MCP hosts I could verify (Claude Code's /mcp and claude mcp list are plain-text output, @modelcontextprotocol/sdk exposes init only via await client.connect() and UnauthorizedError, spec-level lifecycle is handshake-based). Codex is well-positioned to set a pattern here.

extent analysis

TL;DR

To address the issue, add four new ThreadEvent variants to the codex-rs/exec crate and handle the existing McpServerStatusUpdated notification in collect_thread_events to emit per-server MCP lifecycle events as first-class events in codex exec --json output.

Guidance

  • Review the proposed changes to codex-rs/exec/src/exec_events.rs and codex-rs/exec/src/event_processor_with_jsonl_output.rs to ensure they align with the desired functionality.
  • Verify that the new ThreadEvent variants and payload structs are correctly implemented and handle the different MCP server lifecycle states (starting, ready, failed, cancelled).
  • Test the changes thoroughly to ensure they do not introduce any regressions or issues with existing functionality.
  • Consider adding a timestamp field to the new events for additional context, but weigh this against the potential impact on existing event handling and processing.

Example

// in ThreadEvent enum
#[serde(rename = "mcp.server.init_started")]
McpServerInitStarted(McpServerInitStartedEvent),

#[serde(rename = "mcp.server.ready")]
McpServerReady(McpServerReadyEvent),

#[serde(rename = "mcp.server.failed")]
McpServerFailed(McpServerFailedEvent),

#[serde(rename = "mcp.server.cancelled")]
McpServerCancelled(McpServerCancelledEvent),

Notes

The proposed changes are purely additive and do not modify existing event shapes, ensuring backwards compatibility. However, it is essential to review and test the changes carefully to ensure they meet the requirements and do not introduce any issues.

Recommendation

Apply the proposed workaround by adding the new ThreadEvent variants and handling the existing McpServerStatusUpdated notification in collect_thread_events. This will provide a structured way for external tools and agent frameworks to observe per-server MCP lifecycle events without requiring them to re-implement MCP probing and lifecycle handling.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix exec: surface existing MCP server startup notifications in the JSONL event stream [1 participants]