claude-code - 💡(How to fix) Fix scheduled-task sessions: intermittent silent boot failures + mid-session bash-bridge degradation

StepCodex · 2026-05-31T03:20:43Z

[claude-code] Running two Claude Code scheduled tasks Claude Desktop scheduled-tasks subsystem over the past 24 hours surfaces three distinct intermittent fail… Running two Claude Code scheduled tasks (Claude Desktop scheduled-tasks subsystem) over the past 24 hours surfaces three distinct intermittent failure modes that **only occur in the scheduler plane**. Interactive ("chat-plane") Cowork sessions on the same host are healthy. The pattern is the *scheduler runtime itself* — sometimes the scheduled session never gets a workable sandbox; sometimes it gets one but the `mcp__workspace__bash` bridge degrades mid-session. After landing a 15-minute warm-clock + per-tick heartbeat (so we can see which ticks ran at all), the failure pattern becomes measurable: of 11 expected ticks between `2026-05-30T23:53Z` and `2026-05-31T02:38Z`, **4 ticks never appeared in the heartbeat log** — a 75-min blackout `00:08Z → 01:08Z`, plus two single-tick gaps. The scheduled-task launcher itself is silently failing to start sessions ~36% of the time in that window. We have strong but anonymized internal repro evidence (see §3 below); we're filing for visibility / triage. Happy to share host log snippets, scheduled-task JSON definitions, or any other artifact that helps narrow this. ## Fix / Workaround - **Claude Desktop** — macOS, Apple Silicon (M-series), Darwin 25.5.0 - **Two scheduled tasks** registered via `mcp__scheduled-tasks` MCP: - `upwork-feed-dispatcher` — runbook prompt that drives Chrome (via the `Claude in Chrome` MCP), scrapes a results page, writes markdown to a sister directory - `upwork-feed-dv1-request-drain` — a one-shot drain that fulfills detail-fetch requests from a `requests/` queue - **Cron** — previously `0 */6 * * *`; flipped to `*/15 * * * *` mid-investigation to (a) warm the sandbox more frequently and (b) make intermittent failures observable via a per-tick heartbeat JSONL. - **MCP surface used in tick** — `mcp__workspace__bash`, `mcp__cowork__request_cowork_directory`, `mcp__cowork__allow_cowork_file_delete`, `mcp__Claude_in_Chrome__*`, `mcp__scheduled-tasks__*` - Cowork (the Anthropic-issued interactive-session product) is running in parallel on the same machine; its chat-plane sessions are healthy throughout this window. Several were verified mid-degradation: - `2026-05-30T04:03:38Z` host log: `mcp__workspace__bash` returned `outcome=ok` in 330ms inside a Cowork chat session — meanwhile two scheduled-task sessions had silently failed within the previous hour. **Repro:** `upwork-feed-dispatcher` at `2026-05-30T03:37:54Z`. Diagnosed via filesystem absence + host-log scrape. From the agent's runbook perspective, nothing ran; from `lastRunAt`'s perspective, it did. **Repro:** `upwork-feed-dispatcher` at `2026-05-30T18:38Z` (run `run-202605301838`). The agent's own debrief is in our internal communication thread; happy to share verbatim. ## Summary Running two Claude Code scheduled tasks (Claude Desktop scheduled-tasks subsystem) over the past 24 hours surfaces three distinct intermittent failure modes that **only occur in the scheduler plane**. Interactive ("chat-plane") Cowork sessions on the same host are healthy. The pattern is the *scheduler runtime itself* — sometimes the scheduled session never gets a workable sandbox; sometimes it gets one but the `mcp__workspace__bash` bridge degrades mid-session. After landing a 15-minute warm-clock + per-tick heartbeat (so we can see which ticks ran at all), the failure pattern becomes measurable: of 11 expected ticks between `2026-05-30T23:53Z` and `2026-05-31T02:38Z`, **4 ticks never appeared in the heartbeat log** — a 75-min blackout `00:08Z → 01:08Z`, plus two single-tick gaps. The scheduled-task launcher itself is silently failing to start sessions ~36% of the time in that window. We have strong but anonymized internal repro evidence (see §3 below); we're filing for visibility / triage. Happy to share host log snippets, scheduled-task JSON definitions, or any other artifact that helps narrow this. ## Environment - **Claude Desktop** — macOS, Apple Silicon (M-series), Darwin 25.5.0 - **Two scheduled tasks** registered via `mcp__scheduled-tasks` MCP: - `upwork-feed-dispatcher` — runbook prompt that drives Chrome (via the `Claude in Chrome` MCP), scrapes a results page, writes markdown to a sister directory - `upwork-feed-dv1-request-drain` — a one-shot drain that fulfills detail-fetch requests from a `requests/` queue - **Cron** — previously `0 */6 * * *`; flipped to `*/15 * * * *` mid-investigation to (a) warm the sandbox more frequently and (b) make intermittent failures observable via a per-tick heartbeat JSONL. - **MCP surface used in tick** — `mcp__workspace__bash`, `mcp__cowork__request_cowork_directory`, `mcp__cowork__allow_cowork_file_delete`, `mcp__Claude_in_Chrome__*`, `mcp__scheduled-tasks__*` - Cowork (the Anthropic-issued interactive-session product) is running in parallel on the same machine; its chat-plane sessions are healthy throughout th

claude-code2026-05-31 03:20:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Running two Claude Code scheduled tasks (Claude Desktop scheduled-tasks subsystem) over the past 24 hours surfaces three distinct intermittent failure modes that only occur in the scheduler plane. Interactive ("chat-plane") Cowork sessions on the same host are healthy. The pattern is the scheduler runtime itself — sometimes the scheduled session never gets a workable sandbox; sometimes it gets one but the mcp__workspace__bash bridge degrades mid-session.

After landing a 15-minute warm-clock + per-tick heartbeat (so we can see which ticks ran at all), the failure pattern becomes measurable: of 11 expected ticks between 2026-05-30T23:53Z and 2026-05-31T02:38Z, 4 ticks never appeared in the heartbeat log — a 75-min blackout 00:08Z → 01:08Z, plus two single-tick gaps. The scheduled-task launcher itself is silently failing to start sessions ~36% of the time in that window.

We have strong but anonymized internal repro evidence (see §3 below); we're filing for visibility / triage. Happy to share host log snippets, scheduled-task JSON definitions, or any other artifact that helps narrow this.

Error Message

Symptom: Scheduled task fires per cron. lastRunAt is set. No artifact of any kind: no runs.jsonl line, no inbox file, no CAPTURE-ERROR self-report, nothing in ~/Library/Logs/Claude/ recognizable as that tick. The scheduled session appears to have been launched and immediately died before its first MCP call returned. Symptom: Scheduled task fires, agent starts, gets bash partially working (one or two probe writes succeed), then everything stops — no further file writes, no errors emitted, no CAPTURE-ERROR self-report, agent exhausts elapsed cap silently. Repro: upwork-feed-dv1-request-drain at 2026-05-30T03:45:11Z. Wrote two probe files (.stage/_iso and .stage/_ts) at 03:48:50Z (~3.5 min into a 6-min cap), then silence. No subsequent file system mutations, no runs.jsonl append, no CAPTURE-ERROR. Symptom: After flipping the cron from 0 */6 * * * to */15 * * * * and adding a bash echo health-ok-{run_id} pre-flight probe plus a per-tick heartbeat append, the heartbeat JSONL surfaces a different problem: expected ticks simply don't appear at all. The scheduler subsystem appears to not launch the session — no probe runs, no CAPTURE-ERROR is written (the agent's bash echo probe can't run if the session never starts), no host-log entries for that tick.

Root Cause

Fix Action

Fix / Workaround

Claude Desktop — macOS, Apple Silicon (M-series), Darwin 25.5.0
Two scheduled tasks registered via mcp__scheduled-tasks MCP:
- upwork-feed-dispatcher — runbook prompt that drives Chrome (via the Claude in Chrome MCP), scrapes a results page, writes markdown to a sister directory
- upwork-feed-dv1-request-drain — a one-shot drain that fulfills detail-fetch requests from a requests/ queue
Cron — previously 0 */6 * * *; flipped to */15 * * * * mid-investigation to (a) warm the sandbox more frequently and (b) make intermittent failures observable via a per-tick heartbeat JSONL.
MCP surface used in tick — mcp__workspace__bash, mcp__cowork__request_cowork_directory, mcp__cowork__allow_cowork_file_delete, mcp__Claude_in_Chrome__*, mcp__scheduled-tasks__*
Cowork (the Anthropic-issued interactive-session product) is running in parallel on the same machine; its chat-plane sessions are healthy throughout this window. Several were verified mid-degradation:
- 2026-05-30T04:03:38Z host log: mcp__workspace__bash returned outcome=ok in 330ms inside a Cowork chat session — meanwhile two scheduled-task sessions had silently failed within the previous hour.

Repro: upwork-feed-dispatcher at 2026-05-30T03:37:54Z. Diagnosed via filesystem absence + host-log scrape. From the agent's runbook perspective, nothing ran; from lastRunAt's perspective, it did.

Repro: upwork-feed-dispatcher at 2026-05-30T18:38Z (run run-202605301838). The agent's own debrief is in our internal communication thread; happy to share verbatim.

RAW_BUFFERClick to expand / collapse

Summary

Environment

Claude Desktop — macOS, Apple Silicon (M-series), Darwin 25.5.0
Two scheduled tasks registered via mcp__scheduled-tasks MCP:
- upwork-feed-dispatcher — runbook prompt that drives Chrome (via the Claude in Chrome MCP), scrapes a results page, writes markdown to a sister directory
- upwork-feed-dv1-request-drain — a one-shot drain that fulfills detail-fetch requests from a requests/ queue
Cron — previously 0 */6 * * *; flipped to */15 * * * * mid-investigation to (a) warm the sandbox more frequently and (b) make intermittent failures observable via a per-tick heartbeat JSONL.
MCP surface used in tick — mcp__workspace__bash, mcp__cowork__request_cowork_directory, mcp__cowork__allow_cowork_file_delete, mcp__Claude_in_Chrome__*, mcp__scheduled-tasks__*
Cowork (the Anthropic-issued interactive-session product) is running in parallel on the same machine; its chat-plane sessions are healthy throughout this window. Several were verified mid-degradation:
- 2026-05-30T04:03:38Z host log: mcp__workspace__bash returned outcome=ok in 330ms inside a Cowork chat session — meanwhile two scheduled-task sessions had silently failed within the previous hour.

Failure modes observed

Failure mode 1 — never-booted (silent)

Failure mode 2 — partial-boot, mid-tick silent stall

Symptom: Scheduled task fires, agent starts, gets bash partially working (one or two probe writes succeed), then everything stops — no further file writes, no errors emitted, no CAPTURE-ERROR self-report, agent exhausts elapsed cap silently.

Repro: upwork-feed-dv1-request-drain at 2026-05-30T03:45:11Z. Wrote two probe files (.stage/_iso and .stage/_ts) at 03:48:50Z (~3.5 min into a 6-min cap), then silence. No subsequent file system mutations, no runs.jsonl append, no CAPTURE-ERROR.

Failure mode 3 — full boot, mid-tick bash-bridge degradation into multi-execution

Symptom: Scheduled task fires, agent boots clean, runs §pre-flight health probe (single bash echo → returns within 10s), proceeds through mcp__Claude_in_Chrome__* flow (50 results scraped, Cloudflare cleared, Blob snapshot downloaded to ~/Downloads successfully). Then the bash/Python render layer hits a degraded mcp__workspace__bash — every command triple-executes, stdout interleaves with echoed command text, counters become non-deterministic.

The agent's runbook reports contradictory totals (tiles_written: 50, tiles_dedupe_skipped: 50, errors: 50, against tiles_seen_raw: 50); ground-truth ls <expected_dir> | wc -l on the relevant timestamp returned 0 — no files were actually persisted. Agent aborted past the 3-min cap rather than burn more tokens on an unreliable sandbox.

Repro: upwork-feed-dispatcher at 2026-05-30T18:38Z (run run-202605301838). The agent's own debrief is in our internal communication thread; happy to share verbatim.

Failure mode 4 — scheduler-launcher itself silently not launching sessions (post-mitigation observation)

Symptom: After flipping the cron from 0 */6 * * * to */15 * * * * and adding a bash echo health-ok-{run_id} pre-flight probe plus a per-tick heartbeat append, the heartbeat JSONL surfaces a different problem: expected ticks simply don't appear at all. The scheduler subsystem appears to not launch the session — no probe runs, no CAPTURE-ERROR is written (the agent's bash echo probe can't run if the session never starts), no host-log entries for that tick.

Concrete window: between 2026-05-30T23:53Z and 2026-05-31T02:38Z:

Expected	Actual heartbeat line	Notes
23:53Z	✅ present	last entry on 30-May file
00:08Z	❌ missing	start of 75-min blackout
00:23Z	❌ missing
00:38Z	❌ missing
00:53Z	❌ missing
01:08Z	✅ present	first entry on 31-May file — 75-min gap from 23:53
01:23Z	❌ missing
01:38Z	✅ present
01:53Z	✅ present
02:08Z	✅ present
02:23Z	✅ present
02:38Z	✅ present

7 of 11 ticks landed. The 4 missing ticks have no artifact anywhere we can find — not in the heartbeat JSONL, not in ~/Library/Logs/Claude/, not in the scheduled task's local working directory.

What's not the cause (already ruled out)

❌ Not a permission prompt. Permissions were baked into the task; manual operator-triggered sessions of the same runbook from the same machine run end-to-end without intervention.
❌ Not Cloudflare / Chrome / network. Failure mode 1 happens before any MCP call returns. Failure mode 2 happens after probe writes but before any browser nav. Failure mode 3 happens after a successful 50-tile scrape + Blob download.
❌ Not the runbook prompt. Same prompt runs clean in interactive Cowork sessions on the same machine, on the same MCP surface, in the same hour.
❌ Not file corruption. SKILL.md and config.json files load and parse correctly in the manual sessions immediately before/after each failure.

What helps (partial mitigations)

Warm-clock cron (*/15 * * * *) — sandbox warming reduces (but does not eliminate) failure mode 1. Most warm ticks succeed.
Pre-flight bash echo probe with 10s timeout — catches failure mode 1 reliably when the session starts at all. Doesn't help with mode 4 (session never starts) and doesn't catch mode 3 (degradation after a clean probe).
Per-tick heartbeat append — turns "silent failure" into "measurable miss-rate." Without this we'd have no idea modes 1 + 4 were happening at all.

What would help from your side

In rough priority order, anything that lets us narrow the scheduler-launcher path:

Per-tick stderr / launcher log somewhere readable from the host CLI. We looked under ~/Library/Application Support/Claude/, ~/Library/Logs/Claude/, and inside the scheduled task's working directory — found no per-tick artifact attributable to the failed ticks.
A way to detect mid-session mcp__workspace__bash bridge degradation from inside the runbook (mode 3) — currently a clean probe says nothing about whether the next bash call will return cleanly.
Confirmation or denial that the */15 * * * * cron itself is a supported usage pattern for scheduled tasks (we noticed ~36% miss rate in mode 4 even on the warm cron — wondering if 15-min cadence stresses the launcher in a way 6h doesn't).

Reproducer

Self-contained reproducer is hard to share — the runbook depends on a Chrome MCP session authenticated against a third-party site (Upwork). But the runbook itself, the two scheduled-task JSON definitions, the heartbeat JSONL, host log snippets from a few of the failure events, and the cron config are all available — just point us at where to send them.

Co-authored on the operator's side by Cowork (the interactive Anthropic-issued session product); this is a collaborative investigation between Cowork's internal logs + Claude Code's host-CLI vantage. Both agree the failure is in the scheduler plane, not the runbook.

Filed by Claude Code on behalf of operator. Operator handle: borahanmirzaii. Internal tracking thread: borhan-upwork/upwork-os/comms/026, 029, 030, 031, 033, 035, 036.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering