openclaw - ✅(Solved) Fix Anthropic thinking block 'signature' field lost during session persistence — causes API rejection [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#43691Fetched 2026-04-08 00:17:07
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
cross-referenced ×3

Error Message

  1. After ~15+ assistant turns with thinking, the session starts failing with the above error

Root Cause

Anthropic API returns thinking blocks with a signature field that must be passed back verbatim. During session persistence (writing to .jsonl), the signature field is stripped or not captured. Inspecting the session file confirms all thinking blocks have no signature:

type=thinking thinking_len=224 signature=(none)
type=thinking thinking_len=562 signature=(none)
...

This causes any subsequent API call using those messages to fail, since the thinking blocks are considered "modified".

Fix Action

Fix / Workaround

  • Once the session accumulates enough messages with thinking blocks, all further interactions in that session fail permanently
  • /new and /reset commands may not properly clear the corrupted session
  • The only workaround is manually deleting the session JSONL file

PR fix notes

PR #44009: fix(agents): exclude synthetic assistant transcript mirrors from Anthropic replay

Description (problem / solution / changelog)

Summary

Fixes Anthropic extended-thinking replay failures by deduplicating synthetic delivery-mirror assistant transcript turns only when they are true duplicates of an adjacent real assistant reply.

Although the issue initially appeared to be a persistence/signature-loss problem, the session JSONL round-trip already preserved thinking signatures correctly. The actual bug was in replay assembly: a synthetic OpenClaw delivery-mirror assistant turn could be replayed alongside the original assistant reply, which polluted replay history and caused Anthropic reasoning replay to reject it as modified thinking content.

What changed

  • replace target-based delivery-mirror removal with transcript-based deduplication
  • keep the mirror marker narrow:
    • role === "assistant"
    • provider === "openclaw"
    • model === "delivery-mirror"
  • only drop a mirror when it is a true adjacent duplicate:
    • it follows a real assistant turn
    • the previous assistant turn is not from openclaw
    • the normalized visible assistant text matches
  • preserve delivery-mirror turns when they are the only persisted assistant history
  • preserve non-mirror OpenClaw assistant transcript entries such as gateway-injected
  • add regression coverage for:
    • session persistence fidelity of thinking / redacted-thinking content
    • final Anthropic replay payload assembly
    • dropping duplicated mirror turns that shadow a real assistant reply
    • preserving mirror-only assistant history
    • preserving gateway-injected assistant entries
    • intentional abort handling in the payload-capture test helper

Why this fix is safe

This change is intentionally narrow.

It no longer relies on replay target heuristics to remove mirrored assistant history. Instead, it removes only transcript entries that are structurally identifiable as duplicate delivery-mirror shadows of an adjacent real assistant reply. Mirror-only assistant history remains intact.

Validation

  • pnpm test src/agents/pi-embedded-runner.thinking-signature-persistence.test.ts src/agents/pi-embedded-runner.anthropic-thinking-replay.test.ts
  • pnpm build
  • pnpm check

Closes #43691

Changed files

  • src/agents/pi-embedded-runner.anthropic-thinking-replay.test.ts (added, +351/-0)
  • src/agents/pi-embedded-runner.thinking-signature-persistence.test.ts (added, +294/-0)
  • src/agents/pi-embedded-runner/google.ts (modified, +69/-1)

PR #55: feat: foundation layer — trogon-nats, trogon-mcp, trogon-agent-core, acp-telemetry

Description (problem / solution / changelog)

What

Adds the shared infrastructure that the ACP-over-NATS stack sits on. No product behavior yet — this establishes the primitives that the bridge and runner crates build on.

Crates

trogon-nats

Wraps async-nats with connection management, auth, messaging utilities, and a test mock.

auth.rsNatsAuth enum with 5 variants: Credentials (file path), NKey, UserPassword, Token, None. NatsConfig reads from environment with explicit priority order: NATS_CREDS > NATS_NKEY > NATS_USER+ NATS_PASSWORD > NATS_TOKEN > no auth. Supports comma-separated NATS_URL for multi-server clusters.

connect.rsconnect(config, timeout) -> Result<Client, ConnectError>. Uses retry_on_initial_connect() so async_nats::connect() returns immediately and the handshake runs in the background. A oneshot channel catches the first meaningful event: Connected → Ok, authorization violation → Err (fail fast, no point retrying), unreachable server → return the client and let the reconnect loop continue. Exponential backoff capped at 30 s.

client.rs — four traits over async_nats::Client: SubscribeClient, RequestClient, PublishClient, FlushClient. All bridge and runner code is generic over these traits, which makes unit testing with mocks possible without spawning a real NATS server.

messaging.rsrequest / request_with_timeout: serialize request, inject W3C trace context headers, send NATS request, deserialize response. publish: same but fire-and-forget with optional flush. RetryPolicy (no-retries or standard 3×, exponential 50 ms base) and FlushPolicy wrap both operations. Builder pattern for PublishOptions.

mocks.rsAdvancedMockNatsClient: subscribe queues (each inject_messages() call pushes a new UnboundedReceiver that the next subscribe() call pops), request responses keyed by subject, fail_next_* controls for publish, flush, request. All tests in the bridge and runner use this instead of spawning NATS.


trogon-mcp

client.rsMcpClient: HTTP JSON-RPC 2.0 client for a single MCP server. Three methods: initialize() (handshake), list_tools() -> Vec<McpTool>, call_tool(name, args) -> String. Uses a global atomic counter for JSON-RPC ids. McpTool carries name, description, and inputSchema (raw serde_json::Value passed straight through to the Anthropic tool definition). On isError: true the call returns Err(text) so the agent loop can handle tool failures.


trogon-agent-core

tools/mod.rsToolDef (Anthropic tool definition: name, description, JSON Schema, optional cache_control for prompt caching on the last tool). ToolContext carries a reqwest::Client and proxy_url. dispatch_tool is a no-op stub — real dispatch happens in the runner via mcp_dispatch.

agent_loop.rsAgentLoop: the Anthropic messages API call-loop. Fields: http_client, proxy_url, anthropic_token, anthropic_base_url (optional override), anthropic_extra_headers, model, max_iterations, thinking_budget, tool_context, MCP tool defs + dispatch list, permission_checker (optional gate called before each tool execution).

run(history, system_prompt, event_tx) streams events back through a channel: TextDelta, ToolUse, ToolResult, ModeChanged, Usage. Loop: POST to messages API → if end_turn emit final text and return → if tool_use check permission gate, dispatch (MCP first, then built-in), emit ToolResult, append to history, loop. Stops at max_iterations.

Wire types: Message (role + Vec<ContentBlock>), ContentBlock (Text, Image, Thinking, ToolUse, ToolResult), ImageSource (Base64 or Url), ToolResult. PermissionChecker trait for async approval callbacks.


trogon-std

Testable abstractions over OS APIs so the crates above can be tested without touching the real filesystem or environment.

fs/ — traits: ReadFile, WriteFile, CreateDirAll, OpenAppendFile, ExistsFile. Two impls: SystemFs (real OS calls) and MemFs (in-memory HashMap<PathBuf, String> used in tests).

env/ReadEnv trait with SystemEnv and InMemoryEnv impls. InMemoryEnv wraps a HashMap behind a Mutex — tests set variables without touching std::env.

time/GetNow and GetElapsed traits. SystemClock uses std::time::Instant. MockClock lets tests advance time manually.

json.rsJsonSerialize trait + StdJsonSerialize impl. FailNextSerialize fails the Nth call (used to test error paths without mocking the entire serializer).

args.rsargs() helper that returns std::env::args() as a Vec<String> skipping the binary name.


acp-telemetry

OpenTelemetry setup for all ACP binaries.

lib.rsinit_logger(service_name, acp_prefix, env, fs): sets up a tracing subscriber with three layers: JSON to stderr, JSON to a log file (path: ACP_LOG_DIR env var or platform data dir), and OTel traces+metrics+logs if the OTel exporter initializes successfully. Falls back gracefully to stderr-only if OTel is unavailable. shutdown_otel() flushes and shuts down all three providers. meter(name) returns a named opentelemetry::metrics::Meter.

service_name.rsServiceName enum (AcpNatsWs, AcpNatsStdio, AcpRunner) used to name log files and OTel resources.

trace.rs / metric.rs / log.rs — each wraps an OnceLock for its SdkProvider, plus init_provider, force_flush, and shutdown. The signal.rs module provides a cross-platform wait_for_shutdown_signal that catches SIGTERM and SIGINT.


How to review

  • trogon-nats/src/connect.rs — the startup handshake and backoff logic
  • trogon-nats/src/messaging.rs — the retry/flush machinery used by the bridge
  • trogon-agent-core/src/agent_loop.rs — the Anthropic streaming loop
cargo test -p trogon-nats -p trogon-mcp -p trogon-agent-core -p trogon-std -p acp-telemetry

## Changed files

- `.github/workflows/ci-rust.yml` (modified, +2/-0)
- `rsworkspace/Cargo.lock` (modified, +1626/-102)
- `rsworkspace/crates/acp-telemetry/src/lib.rs` (modified, +38/-0)
- `rsworkspace/crates/trogon-agent-core/Cargo.toml` (added, +20/-0)
- `rsworkspace/crates/trogon-agent-core/build.rs` (added, +7/-0)
- `rsworkspace/crates/trogon-agent-core/src/agent_loop.rs` (added, +1178/-0)
- `rsworkspace/crates/trogon-agent-core/src/lib.rs` (added, +4/-0)
- `rsworkspace/crates/trogon-agent-core/src/tools/mod.rs` (added, +65/-0)
- `rsworkspace/crates/trogon-agent-core/tests/agent_loop_integration.rs` (added, +1059/-0)
- `rsworkspace/crates/trogon-mcp/Cargo.toml` (added, +17/-0)
- `rsworkspace/crates/trogon-mcp/src/client.rs` (added, +145/-0)
- `rsworkspace/crates/trogon-mcp/src/lib.rs` (added, +19/-0)
- `rsworkspace/crates/trogon-mcp/tests/mcp_client.rs` (added, +322/-0)
- `rsworkspace/crates/trogon-nats/Cargo.toml` (modified, +1/-0)
- `rsworkspace/crates/trogon-nats/src/auth.rs` (modified, +10/-0)
- `rsworkspace/crates/trogon-nats/src/connect.rs` (modified, +218/-28)
- `rsworkspace/crates/trogon-nats/tests/connect_integration.rs` (added, +194/-0)
- `rsworkspace/crates/trogon-nats/tests/messaging_integration.rs` (added, +152/-0)
- `rsworkspace/crates/trogon-std/src/fs/system.rs` (modified, +38/-0)

Code Example

LLM request rejected: messages.31.content.2: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response.

---

type=thinking thinking_len=224 signature=(none)
type=thinking thinking_len=562 signature=(none)
...
RAW_BUFFERClick to expand / collapse

Problem

When using Anthropic models with extended thinking enabled, the signature field on thinking content blocks is not preserved when messages are serialized to the session JSONL file. When the conversation history is sent back to the Anthropic API in subsequent turns, the API rejects the request:

LLM request rejected: messages.31.content.2: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response.

Root cause

Anthropic API returns thinking blocks with a signature field that must be passed back verbatim. During session persistence (writing to .jsonl), the signature field is stripped or not captured. Inspecting the session file confirms all thinking blocks have no signature:

type=thinking thinking_len=224 signature=(none)
type=thinking thinking_len=562 signature=(none)
...

This causes any subsequent API call using those messages to fail, since the thinking blocks are considered "modified".

Impact

  • Once the session accumulates enough messages with thinking blocks, all further interactions in that session fail permanently
  • /new and /reset commands may not properly clear the corrupted session
  • The only workaround is manually deleting the session JSONL file

Steps to reproduce

  1. Configure an agent with anthropic/claude-sonnet-4-6 (or any Anthropic model with thinking)
  2. Have a multi-turn conversation in a Feishu group (or any channel)
  3. After ~15+ assistant turns with thinking, the session starts failing with the above error

Expected behavior

The signature field (and any other fields) on thinking / redacted_thinking content blocks should be preserved exactly as received from the API when serializing to the session file.

Environment

  • OpenClaw version: 2026.3.8
  • Model: anthropic/claude-sonnet-4-6
  • Provider: anthropic
  • Compaction mode: safeguard

extent analysis

Fix Plan

To preserve the signature field on thinking content blocks, we need to modify the serialization process. Here are the steps:

  • Modify the serialize_message function to include the signature field:
def serialize_message(message):
    # ...
    if 'thinking' in message:
        thinking_block = message['thinking']
        serialized_message['thinking'] = {
            'thinking_len': thinking_block['thinking_len'],
            'signature': thinking_block['signature']  # Add this line
        }
    # ...
    return serialized_message
  • Update the write_session_to_jsonl function to use the modified serialize_message function:
def write_session_to_jsonl(session):
    # ...
    for message in session['messages']:
        serialized_message = serialize_message(message)
        # ...
        jsonl_file.write(json.dumps(serialized_message) + '\n')
    # ...
  • Ensure that the signature field is properly deserialized when reading from the session JSONL file:
def deserialize_message(serialized_message):
    # ...
    if 'thinking' in serialized_message:
        thinking_block = serialized_message['thinking']
        message['thinking'] = {
            'thinking_len': thinking_block['thinking_len'],
            'signature': thinking_block['signature']  # Add this line
        }
    # ...
    return message

Verification

To verify that the fix worked, you can:

  • Inspect the session JSONL file to ensure that the signature field is present for thinking blocks
  • Test a multi-turn conversation with thinking blocks and verify that the session does not fail with the "LLM request rejected" error

Extra Tips

  • Make sure to update the serialize_message and deserialize_message functions to handle both thinking and redacted_thinking blocks
  • Consider adding additional logging or debugging statements to ensure that the signature field is being properly preserved and passed back to the Anthropic API.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The signature field (and any other fields) on thinking / redacted_thinking content blocks should be preserved exactly as received from the API when serializing to the session file.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING