claude-code - 💡(How to fix) Fix [BUG] Model learns and replays synthetic "No response requested." sentinel as real output in long-lived sessions

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Error Messages/Logs

Root Cause

The model is not stuck — it is actively generating "No response requested." as a legitimate 1-token response because the same string appears multiple times in its own prior turns in the session.

Code Example

{
  "type": "assistant",
  "message": {
    "model": "claude-opus-4-7",
    "id": "msg_bdrk_hthmx55f5wsfwhqlof4p5547obdfi3dnvmeus7rwp57kmtoavqsa",
    "content": [{"type": "text", "text": "No response requested."}],
    "stop_reason": "end_turn",
    "usage": {
      "input_tokens": 6,
      "cache_creation_input_tokens": 269617,
      "cache_read_input_tokens": 63517,
      "output_tokens": 9
    }
  },
  "version": "2.1.119"
}

---

2026-05-06 12:39:34  SDK injects  <synthetic>   "No response requested."  (1st pollution)
2026-05-07 06:35:45  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 06:39:02  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 06:46:37  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 06:52:12  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 06:52:32  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 06:53:10  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 07:48:56  SDK injects  <synthetic>   "No response requested."  (2nd pollution)
2026-05-07 09:51:52  model emits  claude-opus-4-7  "No response requested."  9 out_tokens  ← failing user turn

---

import json, os
for root, _, files in os.walk(os.path.expanduser("~/.claude/projects")):
    for f in files:
        if not f.endswith(".jsonl"): continue
        with open(os.path.join(root, f), "r", encoding="utf-8") as fh:
            for i, line in enumerate(fh, 1):
                try: obj = json.loads(line)
                except: continue
                msg = obj.get("message") or {}
                if not isinstance(msg, dict): continue
                content = msg.get("content")
                text = None
                if isinstance(content, list) and content and isinstance(content[0], dict):
                    text = content[0].get("text")
                elif isinstance(content, str):
                    text = content
                if text != "No response requested.": continue
                label = "SYNTHETIC" if msg.get("model") == "<synthetic>" else "MODEL REPLAY"
                print(f"{label}  {f}:{i}  tokens={msg.get('usage',{}).get('output_tokens')}")
RAW_BUFFERClick to expand / collapse

What's Wrong?

The synthetic "No response requested." sentinel that Claude Agent SDK writes into session transcripts is being learned and replayed by the real model on subsequent turns. Unlike #53260 (which documents the synthetic record itself), this issue documents the downstream effect: once a session accumulates one synthetic sentinel, claude-opus-4-7 starts emitting the identical string as a real, billed 1-token text output on unrelated future turns.

Evidence of model replay (not SDK synthesis)

Failing turn in our production session:

{
  "type": "assistant",
  "message": {
    "model": "claude-opus-4-7",
    "id": "msg_bdrk_hthmx55f5wsfwhqlof4p5547obdfi3dnvmeus7rwp57kmtoavqsa",
    "content": [{"type": "text", "text": "No response requested."}],
    "stop_reason": "end_turn",
    "usage": {
      "input_tokens": 6,
      "cache_creation_input_tokens": 269617,
      "cache_read_input_tokens": 63517,
      "output_tokens": 9
    }
  },
  "version": "2.1.119"
}

Key differences vs. #53260:

  • model is the real model id (claude-opus-4-7), not <synthetic>
  • id is a real Bedrock msg_bdrk_* id
  • stop_reason: "end_turn" (not "stop_sequence": "")
  • Billed: output_tokens: 9, total_cost_usd: 1.72

The model is not stuck — it is actively generating "No response requested." as a legitimate 1-token response because the same string appears multiple times in its own prior turns in the session.

Session pollution timeline (one user, one chat_session)

2026-05-06 12:39:34  SDK injects  <synthetic>   "No response requested."  (1st pollution)
2026-05-07 06:35:45  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 06:39:02  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 06:46:37  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 06:52:12  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 06:52:32  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 06:53:10  model emits  claude-opus-4-7  "No response requested."  9 out_tokens
2026-05-07 07:48:56  SDK injects  <synthetic>   "No response requested."  (2nd pollution)
2026-05-07 09:51:52  model emits  claude-opus-4-7  "No response requested."  9 out_tokens  ← failing user turn

Once the first synthetic is in history, the model increasingly returns the exact literal string whenever it encounters a short/ambiguous prompt, because the pattern becomes the most likely continuation token-for-token.

What Should Happen?

Two bugs need separate fixes:

  1. Stop writing the sentinel into history in the first place. The Messages API explicitly allows consecutive user turns (they are auto-merged per https://platform.claude.com/docs/en/api/messages). The synthetic assistant turn at cli.js:bX6 / splice(j+1, 0, fV({content: bX6})) is not required by the protocol — it only serves to make the local transcript look "alternating". Recommended alternative: omit the synthetic record entirely, or mark the interrupted turn with a non-content flag (e.g. aborted: true) that is stripped before the messages array is sent to the API. This matches the approach taken by opencode (MessageAbortedError) and the guidance in Anthropic's own API docs ("treat incomplete message exchanges as if they never occurred").

  2. Sanitize poisoned sessions on load. Even after fix #1 lands, existing session JSONLs on user machines already contain synthetic sentinels (and the real 9-token model replays they induced). A load-time sanitizer — similar to the emoji-truncation sanitizer added in 2.1.132 — should strip both:

    • Any assistant record with model: "<synthetic>" and text equal to one of the SDK sentinels (bX6 = "No response requested.", etc.)
    • Any assistant record whose full content is exactly "No response requested." with output_tokens <= 9, which is the replay signature

Steps to Reproduce

  1. Start a session, send a request
  2. While the model is generating (especially long thinking turns), hit the stop button (manual abort, which raises SIGINT — before 2.1.132's graceful-shutdown fix this leaves an orphan user record in the JSONL)
  3. Resume the same session and send another prompt; SDK injects the <synthetic> sentinel to close the orphan turn
  4. Continue using the session for 10+ turns on short/ambiguous prompts
  5. Observe: eventually the real model starts returning "No response requested." verbatim as a 1-token text output, with a real msg_bdrk_* id

Triggering conditions are essentially universal for long-lived sessions:

  • Any abort of an in-flight request (manual stop, kill -INT, process crash, container recycle)
  • Any resume after the abort

Error Messages/Logs

See the JSON above. The session JSONL path pattern is: ~/.claude/projects/<project-id>/<sessionId>.jsonl

Diagnostic script (extends the one in #53260 to catch the replay signature):

import json, os
for root, _, files in os.walk(os.path.expanduser("~/.claude/projects")):
    for f in files:
        if not f.endswith(".jsonl"): continue
        with open(os.path.join(root, f), "r", encoding="utf-8") as fh:
            for i, line in enumerate(fh, 1):
                try: obj = json.loads(line)
                except: continue
                msg = obj.get("message") or {}
                if not isinstance(msg, dict): continue
                content = msg.get("content")
                text = None
                if isinstance(content, list) and content and isinstance(content[0], dict):
                    text = content[0].get("text")
                elif isinstance(content, str):
                    text = content
                if text != "No response requested.": continue
                label = "SYNTHETIC" if msg.get("model") == "<synthetic>" else "MODEL REPLAY"
                print(f"{label}  {f}:{i}  tokens={msg.get('usage',{}).get('output_tokens')}")

Claude Model

Opus (specifically claude-opus-4-7 on Bedrock; behavior likely reproducible on other models once the sentinel is in history)

Is this a regression?

I don't know — the underlying bX6 injection logic has existed for multiple SDK versions. What is new in our data is large, long-lived sessions (1M-context Opus 4.7, ~330k tokens cached) where the sentinel has many turns to be learned. The problem is likely to become more common as users adopt 1M-context workflows.

Claude Code Version

2.1.119 (we have also inspected 2.1.133 cli.js — the bX6 = "No response requested." definition and splice(j+1, 0, fV({content: bX6})) branch are both still present)

Platform

AWS Bedrock

Operating System

Other Linux (E2B sandbox containers; also reproducible with locally-hosted Claude Code sessions)

Terminal/Shell

Non-interactive/CI environment (invoked via @anthropic-ai/claude-agent-sdk in a sandbox runtime)

Additional Information

  • Related: #53260 (synthetic record itself), #44875 ("No Response Requested" loop), #37040 (isMeta auto-continuation), #44596 (long tool → synthetic rewind)
  • Langfuse observability note: because polluted turns replay cache_creation ≈ full context (~270k tokens in our case), Langfuse/observability backends may drop the input field as "<truncated due to size exceeding limit>", masking the pollution during triage
  • The 2.1.132 graceful-shutdown fix reduces new pollution but does not clean existing transcripts or prevent the model-replay behavior once pollution is present

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING