gemini-cli - ✅(Solved) Fix Improve auto-memory skill extraction with session scratchpads [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
google-gemini/gemini-cli#25895Fetched 2026-04-24 06:13:27
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
0
Participants
Timeline (top)
labeled ×3cross-referenced ×2parent_issue_added ×1unlabeled ×1

Fix Action

Fixed

PR fix notes

PR #25873: feat(memory): persist auto-memory scratchpad for skill extraction

Description (problem / solution / changelog)

Summary

Persist an auto-memory memoryScratchpad into session metadata so skill extraction can use compact workflow hints without relying only on one-line session summaries.

In the 5-trial scratchpad stats eval, the scratchpad path reduced average extractor turns from 13.2 to 11.0 (-16.7%), reduced distractor reads from 3.8 to 2.4 (-36.8%), improved precision from 0.3467 to 0.46 (+32.7%), kept recall at 1.0, and reduced duration from 42812.6ms to 40956.2ms (-4.3%).

Details

  • Persist memoryScratchpad through chat recording, session summary refreshes, and memory service session loading.
  • Backfill scratchpads for sessions that already have summaries without regenerating the summaries.
  • Expose scratchpad-derived workflow hints in the session index used by skill extraction.
  • Keep the recurrence gate strict: workflow hints only route transcript reads and are not standalone evidence for creating a skill.
  • Reuse the shared loadConversationRecord() session parser instead of maintaining a second JSONL parser in summary utilities.
  • Record extraction run metadata (turnCount, durationMs, and terminateReason) so scratchpad impact can be measured.
  • Add eval coverage for scratchpad persistence, scratchpad-vs-summary-only skill extraction behavior, and retrieval quality stats.
  • Sync the prompt-contract unit test with the new workflow-hint wording so preflight stays green.
  • Set GEMINI_CLI_TRUST_WORKSPACE=true for chained and nightly eval workflows so headless eval runs stay aligned with the workspace trust enforcement added in #25814.

Related Issues

Closes #25895.

How to Validate

  1. Run npm run preflight. Expected result: all workspace checks pass.
  2. Run npm exec -- vitest run packages/core/src/services/sessionSummaryUtils.test.ts packages/core/src/services/memoryService.test.ts packages/core/src/agents/skill-extraction-agent.test.ts. Expected result: focused service and prompt-contract tests pass.
  3. Run npm exec -- tsc -p packages/core/tsconfig.json --noEmit. Expected result: core typecheck passes.
  4. Run RUN_EVALS=1 npm exec -- vitest run --config evals/vitest.config.ts -t "Session summary persists memory scratchpad for memory-saving sessions". Expected result: the eval passes and verifies memoryScratchpad is written into the resumed session log.
  5. Run RUN_EVALS=1 npm exec -- vitest run --config evals/vitest.config.ts -t "memory scratchpad improves repeated-workflow recall versus summary-only index". Expected result: the eval passes and scratchpad-enabled retrieval matches or beats the summary-only baseline for the repeated workflow fixture.
  6. Run RUN_EVALS=1 RUN_SCRATCHPAD_STATS=1 SCRATCHPAD_STATS_TRIALS=5 npm exec -- vitest run --config evals/vitest.config.ts -t "reports memory scratchpad retrieval statistics". Expected result: the eval passes and writes evals/logs/skill_extraction_scratchpad_stats.json.
  7. Run GEMINI_MODEL=gemini-3-pro-preview GEMINI_CLI_TRUST_WORKSPACE=true npm exec -- vitest run --config evals/vitest.config.ts evals/save_memory.eval.ts -t "Agent remembers user's favorite color". Expected result: the trusted eval subprocess can call save_memory successfully.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

Changed files

  • .github/workflows/chained_e2e.yml (modified, +1/-0)
  • .github/workflows/evals-nightly.yml (modified, +1/-0)
  • evals/save_memory.eval.ts (modified, +163/-0)
  • evals/skill_extraction.eval.ts (modified, +647/-7)
  • packages/core/src/agents/local-executor.ts (modified, +8/-0)
  • packages/core/src/agents/skill-extraction-agent.test.ts (modified, +4/-2)
  • packages/core/src/agents/skill-extraction-agent.ts (modified, +5/-4)
  • packages/core/src/agents/types.ts (modified, +2/-0)
  • packages/core/src/services/chatRecordingService.ts (modified, +26/-0)
  • packages/core/src/services/chatRecordingTypes.ts (modified, +15/-0)
  • packages/core/src/services/memoryService.test.ts (modified, +249/-12)
  • packages/core/src/services/memoryService.ts (modified, +104/-83)
  • packages/core/src/services/sessionScratchpadUtils.ts (added, +122/-0)
  • packages/core/src/services/sessionSummaryUtils.test.ts (modified, +371/-16)
  • packages/core/src/services/sessionSummaryUtils.ts (modified, +314/-32)
RAW_BUFFERClick to expand / collapse

Problem

Auto-memory skill extraction relies too heavily on compact session summaries when deciding which prior sessions are worth reading. Those summaries are useful, but they can lose the workflow details that matter for recurring skill detection: tool sequence, touched files, validation outcome, and whether the session was actually part of the repeated workflow.

That makes extraction more likely to read distractor sessions or miss relevant recurrence evidence.

Expected Outcome

Persist lightweight workflow metadata with session records so skill extraction can route to the right transcripts more reliably, while still requiring transcript reads before creating a skill.

Proposed Fix

  • Store a memoryScratchpad in session metadata with workflow summary, tool sequence, touched paths, and validation status.
  • Backfill scratchpads without regenerating existing summaries.
  • Include scratchpad-derived workflow hints in the session index used by skill extraction.
  • Keep the recurrence gate strict: scratchpad hints route transcript reads, but do not count as standalone skill evidence.
  • Add eval coverage comparing scratchpad-enabled extraction against summary-only retrieval and collect extraction quality stats.

Acceptance Criteria

  • Session summary refreshes persist memoryScratchpad for memory-saving sessions.
  • Skill extraction can use scratchpad workflow hints to reduce irrelevant transcript reads.
  • Existing summary loading continues to reuse the shared session log parser.
  • Behavioral evals cover scratchpad persistence and scratchpad-vs-summary-only retrieval.

extent analysis

TL;DR

Store workflow metadata in a memoryScratchpad to improve skill extraction accuracy by providing more detailed session information.

Guidance

  • To address the issue, consider implementing the proposed fix of storing a memoryScratchpad in session metadata, which includes workflow summary, tool sequence, touched paths, and validation status.
  • Backfilling existing sessions with memoryScratchpad data without regenerating summaries can help leverage historical data for improved skill extraction.
  • Including scratchpad-derived workflow hints in the session index used by skill extraction can help reduce irrelevant transcript reads.
  • Ensure that the recurrence gate remains strict, using scratchpad hints to route transcript reads but not as standalone skill evidence.

Example

No explicit code example is provided in the issue, but the proposed fix suggests adding a memoryScratchpad field to session metadata, which could be implemented as follows:

session_metadata = {
    # ... existing fields ...
    'memoryScratchpad': {
        'workflow_summary': 'summary_data',
        'tool_sequence': ['tool1', 'tool2'],
        'touched_paths': ['/path1', '/path2'],
        'validation_status': 'success'
    }
}

Notes

The proposed fix assumes that the existing session log parser can be reused for loading summaries, and that behavioral evaluations will be conducted to compare scratchpad-enabled extraction with summary-only retrieval.

Recommendation

Apply the proposed workaround by storing memoryScratchpad data in session metadata, as it is expected to improve skill extraction accuracy without requiring significant changes to the existing system.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING