claude-code - 💡(How to fix) Fix FleetView buckets agents by stale text-classifier verdict instead of live session status — actively-working agents shown under Completed

StepCodex · 2026-05-30T18:24:33Z

[claude-code] In the FleetView / background-agents list the N working · M completed view , agents that are actively working are frequently shown under Complete… In the FleetView / background-agents list (the `N working · M completed` view), agents that are **actively working** are frequently shown under **Completed**. The mis-bucketing is not random: the bucket is driven by a **stale, sticky text-classifier verdict** persisted in each job's `state.json` (`.state`), which lags the agent's true activity. The harness already knows the real state precisely (the session process is `busy`), but the bucketing consults the classifier verdict instead, so a row that emitted a conclusion-looking message and then resumed work stays parked under Completed until it emits its next classifiable line. This is the "actively-working shown as Completed" direction of the broader bucket-accuracy problem. Related: #59011 (classifier reads only assistant-message text for sentinels), #63094 (stuck `running` timer after return), #59518 (stuck `working` on plan-mode/question). ## Summary In the FleetView / background-agents list (the `N working · M completed` view), agents that are **actively working** are frequently shown under **Completed**. The mis-bucketing is not random: the bucket is driven by a **stale, sticky text-classifier verdict** persisted in each job's `state.json` (`.state`), which lags the agent's true activity. The harness already knows the real state precisely (the session process is `busy`), but the bucketing consults the classifier verdict instead, so a row that emitted a conclusion-looking message and then resumed work stays parked under Completed until it emits its next classifiable line. This is the "actively-working shown as Completed" direction of the broader bucket-accuracy problem. Related: #59011 (classifier reads only assistant-message text for sentinels), #63094 (stuck `running` timer after return), #59518 (stuck `working` on plan-mode/question). ## Environment - Claude Code `v2.1.158` - macOS (darwin), Opus 4.8 - Multiple concurrent background (`kind: bg`) agent sessions ## What I observed (reproducible from disk, no screenshot needed) There are two independent signals on disk: 1. **Bucket source — the classifier verdict (sticky):** `~/.claude/jobs/ /state.json` → `.state` ∈ `working | blocked | done | failed`, with a history in `~/.claude/jobs/ /timeline.jsonl`. The sibling `.detail` field is a copy of the agent's last assistant message — confirming `.state` is derived from message **text**, not process state. This value is only rewritten when the agent emits a new classifiable message. 2. **Ground-truth liveness (live):** a *separate* file, `~/.claude/sessions/ .json` → `.status` (`busy`/`idle`) and a real OS `.pid`, mirrored as `.tempo` (`active`/`idle`) inside `state.json`. With 10 concurrent background agents, I cross-referenced the two. Findings: - **Every one of the 10 jobs had `.state = "done"` persisted** — including a job the UI itself displayed under *Working*. So `.state` is unreliable as a current-activity signal, and FleetView's bucketing does not consistently agree with it either. - **4 of those 10 sessions were genuinely `busy` with live PIDs** (`.status == "busy"`, `kill -0 ` succeeds), i.e. actively processing a turn — yet 3 of them were rendered under **Completed**. (The 4th was correctly under Working.) - The disagreement is internal to the data the harness already has: `.state` (text classifier) says `done` while `.tempo` / session `.status` (process) says `active`/`busy`. Illustrative shape (anonymized): ``` row | bucket shown | state.json .state | session .status | pid alive | actually working? -------------------|--------------|-------------------|-----------------|-----------|------------------ agent A | Working | done | busy | yes | yes (correct) agent B | Completed | done | busy | yes | YES (mis-bucketed) agent C | Completed | done | busy | yes | YES (mis-bucketed) agent D | Completed | done | busy | yes | YES (mis-bucketed) agent E..J | Completed | done | idle | - | no (correct) ``` ## Repro recipe (read-only audit) ```bash # Build a session index (jobId -> status,pid), then diff each job's classifier # verdict against its real liveness: for d in ~/.claude/jobs/*/; do sj="$d/state.json"; [ -f "$sj" ] || continue jq -r '"\(.name)\tstate=\(.state)\ttempo=\(.tempo)"' "$sj" done # then for each job's daemonShort, look up ~/.claude/sessions/*.json where # .jobId matches and compare .status (busy/idle) + live pid against .state. ``` Any job where `.state == "done"` (→ Completed bucket) while session `.status == "busy"` / `.tempo == "active"` and the PID is alive is a false-Completed. ## Root cause hypothesis The bucket is computed from a **lagging indicator** (last-message text sentinel) rather than the **authoritative one** (session process status the daemon already tracks). The sequence that produces the bug: 1. Agent finishes a turn with conclusion-looking text →

In the FleetView / background-agents list (the N working · M completed view), agents that are actively working are frequently shown under Completed. The mis-bucketing is not random: the bucket is driven by a stale, sticky text-classifier verdict persisted in each job's state.json (.state), which lags the agent's true activity. The harness already knows the real state precisely (the session process is busy), but the bucketing consults the classifier verdict instead, so a row that emitted a conclusion-looking message and then resumed work stays parked under Completed until it emits its next classifiable line.

This is the "actively-working shown as Completed" direction of the broader bucket-accuracy problem. Related: #59011 (classifier reads only assistant-message text for sentinels), #63094 (stuck running timer after return), #59518 (stuck working on plan-mode/question).

Code Example

row                | bucket shown | state.json .state | session .status | pid alive | actually working?
-------------------|--------------|-------------------|-----------------|-----------|------------------
agent A            | Working      | done              | busy            | yes       | yes  (correct)
agent B            | Completed    | done              | busy            | yes       | YES  (mis-bucketed)
agent C            | Completed    | done              | busy            | yes       | YES  (mis-bucketed)
agent D            | Completed    | done              | busy            | yes       | YES  (mis-bucketed)
agent E..J         | Completed    | done              | idle            | -         | no   (correct)

---

# Build a session index (jobId -> status,pid), then diff each job's classifier
# verdict against its real liveness:
for d in ~/.claude/jobs/*/; do
  sj="$d/state.json"; [ -f "$sj" ] || continue
  jq -r '"\(.name)\tstate=\(.state)\ttempo=\(.tempo)"' "$sj"
done
# then for each job's daemonShort, look up ~/.claude/sessions/*.json where
# .jobId matches and compare .status (busy/idle) + live pid against .state.

Summary

Environment

Claude Code v2.1.158
macOS (darwin), Opus 4.8
Multiple concurrent background (kind: bg) agent sessions

What I observed (reproducible from disk, no screenshot needed)

There are two independent signals on disk:

Bucket source — the classifier verdict (sticky): ~/.claude/jobs/<id>/state.json → .state ∈ working | blocked | done | failed, with a history in ~/.claude/jobs/<id>/timeline.jsonl. The sibling .detail field is a copy of the agent's last assistant message — confirming .state is derived from message text, not process state. This value is only rewritten when the agent emits a new classifiable message.
Ground-truth liveness (live): a separate file, ~/.claude/sessions/<pid>.json → .status (busy/idle) and a real OS .pid, mirrored as .tempo (active/idle) inside state.json.

With 10 concurrent background agents, I cross-referenced the two. Findings:

Every one of the 10 jobs had .state = "done" persisted — including a job the UI itself displayed under Working. So .state is unreliable as a current-activity signal, and FleetView's bucketing does not consistently agree with it either.
4 of those 10 sessions were genuinely busy with live PIDs (.status == "busy", kill -0 <pid> succeeds), i.e. actively processing a turn — yet 3 of them were rendered under Completed. (The 4th was correctly under Working.)
The disagreement is internal to the data the harness already has: .state (text classifier) says done while .tempo / session .status (process) says active/busy.

Illustrative shape (anonymized):

row                | bucket shown | state.json .state | session .status | pid alive | actually working?
-------------------|--------------|-------------------|-----------------|-----------|------------------
agent A            | Working      | done              | busy            | yes       | yes  (correct)
agent B            | Completed    | done              | busy            | yes       | YES  (mis-bucketed)
agent C            | Completed    | done              | busy            | yes       | YES  (mis-bucketed)
agent D            | Completed    | done              | busy            | yes       | YES  (mis-bucketed)
agent E..J         | Completed    | done              | idle            | -         | no   (correct)

Repro recipe (read-only audit)

# Build a session index (jobId -> status,pid), then diff each job's classifier
# verdict against its real liveness:
for d in ~/.claude/jobs/*/; do
  sj="$d/state.json"; [ -f "$sj" ] || continue
  jq -r '"\(.name)\tstate=\(.state)\ttempo=\(.tempo)"' "$sj"
done
# then for each job's daemonShort, look up ~/.claude/sessions/*.json where
# .jobId matches and compare .status (busy/idle) + live pid against .state.

Any job where .state == "done" (→ Completed bucket) while session .status == "busy" / .tempo == "active" and the PID is alive is a false-Completed.

Root cause hypothesis

The bucket is computed from a lagging indicator (last-message text sentinel) rather than the authoritative one (session process status the daemon already tracks). The sequence that produces the bug:

Agent finishes a turn with conclusion-looking text → classifier stamps .state = done → row moves to Completed.
Agent resumes work (a /loop tick, a queued/resumed task, or a user reply) → session goes busy again.
No new classifiable message has been emitted yet → .state stays frozen at done → row stays under Completed while actively working.

Suggested fix

When choosing a row's bucket, treat live session activity as authoritative over the stale text verdict: if session.status == busy / tempo == active with a live PID, the row should render under Working regardless of the last-message sentinel. Equivalently, invalidate/refresh .state on the idle → busy transition instead of only on message emission. (This also fixes the inverse stale cases.)

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix FleetView buckets agents by stale text-classifier verdict instead of live session status — actively-working agents shown under Completed

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root cause hypothesis

Code Example

Summary

Environment

What I observed (reproducible from disk, no screenshot needed)

Repro recipe (read-only audit)

Root cause hypothesis

Suggested fix

Still need to ship something?

TRENDING