openclaw - ✅(Solved) Fix [RFC] Context Provenance: Add source/volatility metadata to injected context segments [1 pull requests, 3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#54373Fetched 2026-04-08 01:28:25
View on GitHub
Comments
3
Participants
2
Timeline
10
Reactions
0
Author
Participants
Timeline (top)
commented ×3cross-referenced ×2mentioned ×2subscribed ×2

PR fix notes

PR #54830: feat: context provenance metadata for injected bootstrap segments

Description (problem / solution / changelog)

Summary

  • Add optional provenance metadata (source, injectedAt, volatile) to EmbeddedContextFile type
  • Bootstrap context files (SOUL.md, AGENTS.md, TOOLS.md, etc.) are now tagged with provenance when assembled by buildBootstrapContextFiles()
  • System prompt injects <!-- ctx:provenance ... --> HTML comment before each tagged context segment
  • New "Context Provenance" section in system prompt instructs the agent to re-read volatile content before relying on it
  • Backward-compatible: provenance is optional, existing code works unchanged

Closes #54373 (Phase 1 of the RFC)

Changed files

  • src/agents/pi-embedded-helpers/types.ts — extended EmbeddedContextFile with optional provenance
  • src/agents/pi-embedded-helpers/bootstrap.ts — attach provenance to each bootstrap file
  • src/agents/system-prompt.ts — render provenance tags + add guidance section
  • src/agents/system-prompt.test.ts — tests for provenance tag rendering
  • src/agents/pi-embedded-helpers.buildbootstrapcontextfiles.test.ts — tests for provenance attachment

Test plan

  • Existing tests pass (pnpm test)
  • New tests verify provenance tags are rendered in system prompt
  • New tests verify provenance metadata attached to bootstrap context files
  • Manual verification: start OpenClaw, inspect system prompt for <!-- ctx:provenance ... --> tags
  • Token overhead < 1% of bootstrap budget

🤖 Generated with Claude Code

Changed files

  • src/agents/pi-embedded-helpers.buildbootstrapcontextfiles.test.ts (modified, +21/-0)
  • src/agents/pi-embedded-helpers/bootstrap.ts (modified, +10/-0)
  • src/agents/pi-embedded-helpers/types.ts (modified, +18/-1)
  • src/agents/system-prompt.test.ts (modified, +59/-0)
  • src/agents/system-prompt.ts (modified, +25/-0)

Code Example

[source: SOUL.md, injected_at: session_start, volatile: true]
## Knowledge Principles
...

[source: memory_search, query: "auth flow", retrieved_at: turn_3, volatile: false]
The auth flow uses JWT tokens stored in...

[source: read("retriever.py"), turn: 12, volatile: false]
def retrieve(...): ...
RAW_BUFFERClick to expand / collapse

Problem

When PiEmbeddedRunner assembles the system prompt, all injected content (SOUL.md, AGENTS.md, memory search results, skill metadata) enters the context window as flat text. The agent cannot distinguish:

  • Content injected at session start vs. content freshly read in the current turn
  • Volatile files (frequently changing) vs. stable configuration
  • Stale memory search results vs. authoritative tool outputs

This leads to the agent treating potentially outdated information as current, causing incorrect decisions — especially in long-running sessions where injected project docs may have changed on disk.

Prior Art

This is a recognized problem in the LLM agent space, but no framework has implemented an in-context solution:

  • MemOS (MemCubes) — metadata-rich memory containers with provenance/versioning, but metadata stays external to the context window (arXiv:2507.03724)
  • OpenAI Agents SDK — memory notes carry last_update_date + recency-wins conflict resolution, but limited to memory, not general context
  • "Architectures for Building Agentic AI" — prescribes TTL policies, content hashes, typed slots per source (arXiv:2512.09458)
  • Chroma's "Context Rot" research — demonstrates universal accuracy degradation with growing context across 18 frontier models

None of these implement inline provenance metadata within the token stream that the agent can reason over.

Proposed Solution

Add structured metadata tags to each context segment during prompt assembly in PiEmbeddedRunner:

[source: SOUL.md, injected_at: session_start, volatile: true]
## Knowledge Principles
...

[source: memory_search, query: "auth flow", retrieved_at: turn_3, volatile: false]
The auth flow uses JWT tokens stored in...

[source: read("retriever.py"), turn: 12, volatile: false]
def retrieve(...): ...

Agent-side behavior: System prompt includes a rule: if volatile=true and injected_at=session_start, auto-trigger a re-read before relying on that content.

Implementation Scope (incremental)

  1. Phase 1 — Prompt assembly tagging: PiEmbeddedRunner annotates each injected segment with source, injected_at, volatile
  2. Phase 2 — Memory module: memory_search / memory_get results carry last_modified timestamps
  3. Phase 3 — Agent instruction: Add provenance-aware reasoning rules to the base system prompt
  4. Phase 4 — Orchestration-level auto-refresh: Framework detects stale volatile segments and refreshes before inference (no longer relying on model compliance)

Why OpenClaw is the right place for this

  • PiEmbeddedRunner already separates context sources (SOUL.md, AGENTS.md, skills, memory) — the injection points are clear
  • Metadata-only skill injection shows the team already thinks about context efficiency
  • Context compaction + isolation mechanisms exist — provenance tagging is a natural extension
  • No existing framework has done this — OpenClaw could set the standard

Open Questions

  • Should volatility be configured per-file (in openclaw.json) or inferred from file type/change frequency?
  • How should this interact with context compaction — preserve tags through compaction, or re-tag after?
  • Token overhead of metadata tags — worth measuring impact on context budget

Happy to contribute a PR for Phase 1 if the approach looks reasonable.

extent analysis

Fix Plan

To address the issue of the agent treating outdated information as current, we will implement inline provenance metadata within the token stream. The solution involves the following steps:

  • Phase 1 — Prompt assembly tagging: Modify PiEmbeddedRunner to annotate each injected segment with source, injected_at, and volatile metadata tags.
  • Phase 2 — Memory module: Update memory_search and memory_get results to carry last_modified timestamps.
  • Phase 3 — Agent instruction: Add provenance-aware reasoning rules to the base system prompt.
  • Phase 4 — Orchestration-level auto-refresh: Implement framework-level detection of stale volatile segments and refresh before inference.

Example Code

To implement Phase 1, you can modify the PiEmbeddedRunner to include metadata tags in the prompt assembly:

def assemble_prompt(self, context_segments):
    tagged_segments = []
    for segment in context_segments:
        # Determine source and volatility of the segment
        source = self.determine_source(segment)
        volatile = self.is_volatile(segment)
        injected_at = self.get_injection_time(segment)
        
        # Create metadata tag
        metadata_tag = f"[source: {source}, injected_at: {injected_at}, volatile: {volatile}]"
        
        # Append metadata tag to the segment
        tagged_segment = f"{metadata_tag}\n{segment}"
        tagged_segments.append(tagged_segment)
    
    return "\n".join(tagged_segments)

Verification

To verify that the fix worked, you can test the PiEmbeddedRunner with different scenarios, such as:

  • Injecting new content at session start and verifying that the agent correctly identifies it as fresh.
  • Updating volatile files and verifying that the agent detects the changes and refreshes the content.
  • Checking that the agent correctly distinguishes between stale memory search results and authoritative tool outputs.

Extra Tips

  • Consider configuring volatility per-file in openclaw.json to provide more fine-grained control.
  • Measure the token overhead of metadata tags to ensure it does not significantly impact the context budget.
  • Preserve metadata tags through context compaction to maintain provenance information.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING