claude-code - 💡(How to fix) Fix Agent Session Persistence — 118 sessions of crash test data [3 comments, 3 participants]

The Problem

Claude Code has no session persistence. Context fills, conversation dies, all state is lost. Autonomous agent loops are impossible without external orchestration.

I've run 118 autonomous Claude Code sessions building a startup. Not conversations — sessions with handoff protocols, context lifecycle management, multi-memory persistence, and autonomous cycling that runs overnight without human input.

Your product doesn't support any of this natively. So I built it. Here's what I learned.

What I Built (118 Sessions, 4 Iterations)

Context Guardian — PreToolUse hook that scores every tool call, tracks cumulative context consumption, escalates through reminder levels, then blocks ALL tools except the shutdown command. Prevents the agent from working past its cognitive cliff.

Sleep Cycle Protocol — Deterministic 3-phase shutdown: (1) persist state to multi-memory + write handoff contract, (2) write autonomous work queue, (3) external VBS bridge creates fresh conversation and restarts. v4.1 includes mutex locking for multi-session overlap prevention.

Multi-Memory Architecture:

Structured notes (MCP server) — decisions, summaries, searchable by title/tag
Knowledge graph (MCP server) — entities + relations, cross-session references
Semantic search (custom BM25 + vector hybrid) — 2,135 chunks, 222 sources
Tiered query protocol: all 3 stores queried in parallel before any file read

Handoff Protocol: 6-field contract written at every session close. Next session picks up in 5 seconds. Zero ambiguity. Fields: completed, stopped-at, next-action, decisions, open-threads, files-modified.

Session Type Routing: 4 types (Sprint/Decision/Pipeline/Orientation) with different context budgets (60%/40%/80%/35%), different tool access rules, and different close conditions. Hook-enforced via temp file state.

Autonomous Work Queue: JSON marker file read at session start. If present → auto-enter Sprint mode, execute queue items sequentially, sleep at threshold, cycle repeats. No human input required. Runs overnight.

10 Real Failure Modes (of 40+)

#	What Broke	How I Fixed It
1	/clear kills webview DOM — external input dies	newConversation (Ctrl+Alt+N) preserves webview
2	Clipboard paste fails in VS Code webview	SendKeys types directly into intact webview
3	Multiple VBS scripts race-condition on overlap	Mutex lock file + taskkill pre-launch
4	Compound knowledge graph queries return empty	Decompose to parallel single-concept queries
5	Guardian can't force shutdown, only nags	Block ALL tools except /sleep at critical threshold
6	Guardian state persists across new conversations	Reset temp state file at session start
7	Deleting autonomous marker triggers permission prompt	Read-only — overwrite on next cycle, never delete
8	Single focus attempt misses panel after new conversation	3x focus attempts with 1.5s gaps
9	Retry attempts stomp on already-running session	Single attempt after generous wait
10	Session startup burns 30%+ context before real work	Session type routing with budget allocation

What Native Implementation Needs

Session persistence API — save/restore structured state across context resets. Not just chat history — decisions, work queues, behavioral rules.
Context lifecycle hooks — native cognitive load estimation + graceful shutdown trigger. Not a token counter — a cognitive load estimator.
Multi-memory tiers — session (short-term), project (medium-term), user (long-term). Different query patterns for each.
Autonomous agent loops — execute → save state → reset → resume at API level. No VBS. No keyboard automation.
Session type contracts — declarative mode switching with tool access rules and context budgets.
Handoff protocol standard — formalize the minimum viable state contract between sessions.

About Me

I'm a BCG consultant building a startup entirely inside Claude Code. I'm not a developer tools person — I'm a power user who solved infrastructure problems because the native tools didn't exist yet. That's exactly the perspective your team needs.

Happy to walk through the full architecture, share the codebase, or advise on the native implementation. 118 sessions of crash test data is yours if you want it.

Contact: [email protected] | GitHub: @sakshamking

extent analysis

TL;DR

Implement a session persistence API to save and restore structured state across context resets, enabling autonomous agent loops and context lifecycle management.

Guidance

Introduce a native session persistence mechanism to store and retrieve decisions, work queues, and behavioral rules, allowing for seamless context switching and autonomous agent loops.
Develop context lifecycle hooks to estimate cognitive load and trigger graceful shutdown, preventing context loss and enabling efficient session management.
Design a multi-memory architecture with short-term, medium-term, and long-term storage tiers to accommodate different query patterns and session types.
Establish a standard handoff protocol to ensure consistent state contracts between sessions, enabling reliable session routing and context switching.
Consider implementing session type contracts to declaratively switch between modes with tool access rules and context budgets.

Example

No code example is provided due to the high-level nature of the issue, but a potential implementation could involve creating a session persistence API with methods for saving and restoring state, such as saveSessionState() and restoreSessionState().

Notes

The implementation of these features would require significant changes to the existing architecture and may involve trade-offs between complexity, performance, and usability. It is essential to carefully consider the design and testing of these features to ensure they meet the needs of power users like the issue author.

Recommendation

Apply a workaround by implementing a custom session persistence mechanism using existing APIs and tools, such as the Context Guardian and Sleep Cycle Protocol described in the issue, until a native implementation is available. This approach would allow for some level of session persistence and autonomous agent loops, although it may not be as seamless or efficient as a native implementation.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Agent Session Persistence — 118 sessions of crash test data [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

The Problem

What I Built (118 Sessions, 4 Iterations)

10 Real Failure Modes (of 40+)

What Native Implementation Needs

About Me

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Agent Session Persistence — 118 sessions of crash test data [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

The Problem

What I Built (118 Sessions, 4 Iterations)

10 Real Failure Modes (of 40+)

What Native Implementation Needs

About Me

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING