openclaw - ✅(Solved) Fix [RFC] Context Provenance: Add source/volatility metadata to injected context segments [1 pull requests, 3 comments, 2 participants]

jack91620 · 2026-03-25T08:54:18Z

[openclaw] PR 54830: feat: context provenance metadata for injected bootstrap segments - Repository: openclaw/openclaw - Author: jack91620 - State: open | merg… # PR #54830: feat: context provenance metadata for injected bootstrap segments - Repository: openclaw/openclaw - Author: jack91620 - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/54830 ## Description (problem / solution / changelog) ## Summary - Add optional `provenance` metadata (`source`, `injectedAt`, `volatile`) to `EmbeddedContextFile` type - Bootstrap context files (SOUL.md, AGENTS.md, TOOLS.md, etc.) are now tagged with provenance when assembled by `buildBootstrapContextFiles()` - System prompt injects ` ` HTML comment before each tagged context segment - New "Context Provenance" section in system prompt instructs the agent to re-read volatile content before relying on it - Backward-compatible: `provenance` is optional, existing code works unchanged Closes #54373 (Phase 1 of the RFC) ## Changed files - `src/agents/pi-embedded-helpers/types.ts` — extended `EmbeddedContextFile` with optional `provenance` - `src/agents/pi-embedded-helpers/bootstrap.ts` — attach provenance to each bootstrap file - `src/agents/system-prompt.ts` — render provenance tags + add guidance section - `src/agents/system-prompt.test.ts` — tests for provenance tag rendering - `src/agents/pi-embedded-helpers.buildbootstrapcontextfiles.test.ts` — tests for provenance attachment ## Test plan - [ ] Existing tests pass (`pnpm test`) - [ ] New tests verify provenance tags are rendered in system prompt - [ ] New tests verify provenance metadata attached to bootstrap context files - [ ] Manual verification: start OpenClaw, inspect system prompt for ` ` tags - [ ] Token overhead < 1% of bootstrap budget 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## Changed files - `src/agents/pi-embedded-helpers.buildbootstrapcontextfiles.test.ts` (modified, +21/-0) - `src/agents/pi-embedded-helpers/bootstrap.ts` (modified, +10/-0) - `src/agents/pi-embedded-helpers/types.ts` (modified, +18/-1) - `src/agents/system-prompt.test.ts` (modified, +59/-0) - `src/agents/system-prompt.ts` (modified, +25/-0) ### Problem When `PiEmbeddedRunner` assembles the system prompt, all injected content (SOUL.md, AGENTS.md, memory search results, skill metadata) enters the context window as flat text. The agent cannot distinguish: - Content injected at session start vs. content freshly read in the current turn - Volatile files (frequently changing) vs. stable configuration - Stale memory search results vs. authoritative tool outputs This leads to the agent treating potentially outdated information as current, causing incorrect decisions — especially in long-running sessions where injected project docs may have changed on disk. ### Prior Art This is a recognized problem in the LLM agent space, but no framework has implemented an in-context solution: - **MemOS (MemCubes)** — metadata-rich memory containers with provenance/versioning, but metadata stays external to the context window ([arXiv:2507.03724](https://arxiv.org/abs/2507.03724)) - **OpenAI Agents SDK** — memory notes carry `last_update_date` + recency-wins conflict resolution, but limited to memory, not general context - **"Architectures for Building Agentic AI"** — prescribes TTL policies, content hashes, typed slots per source ([arXiv:2512.09458](https://arxiv.org/html/2512.09458v1)) - **Chroma's "Context Rot" research** — demonstrates universal accuracy degradation with growing context across 18 frontier models None of these implement **inline provenance metadata within the token stream** that the agent can reason over. ### Proposed Solution Add structured metadata tags to each context segment during prompt assembly in `PiEmbeddedRunner`: ``` [source: SOUL.md, injected_at: session_start, volatile: true] ## Knowledge Principles ... [source: memory_search, query: "auth flow", retrieved_at: turn_3, volatile: false] The auth flow uses JWT tokens stored in... [source: read("retriever.py"), turn: 12, volatile: false] def retrieve(...): ... ``` **Agent-side behavior:** System prompt includes a rule: if `volatile=true` and `injected_at=session_start`, auto-trigger a re-read before relying on that content. ### Implementation Scope (incremental) 1. **Phase 1 — Prompt assembly tagging**: `PiEmbeddedRunner` annotates each injected segment with `source`, `injected_at`, `volatile` 2. **Phase 2 — Memory module**: `memory_search` / `memory_get` results carry `last_modified` timestamps 3. **Phase 3 — Agent instruction**: Add provenance-aware reasoning rules to the base system prompt 4. **Phase 4 — Orchestration-level auto-refresh**: Framework detects stale volatile segments and refreshes before inference (no longer relying on model compliance) ### Why OpenClaw is the right place for this - `PiEmbeddedRunner` already separates context sources (SOUL.md, AGENTS.md, skills, memo

openclaw2026-03-25 08:54:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#54373•Fetched 2026-04-08 01:28:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jack91620

Participants

jack91620

sm86

Timeline (top)

commented ×3cross-referenced ×2mentioned ×2subscribed ×2

PR fix notes

PR #54830: feat: context provenance metadata for injected bootstrap segments

Repository: openclaw/openclaw
Author: jack91620
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/54830

Description (problem / solution / changelog)

Summary

Add optional provenance metadata (source, injectedAt, volatile) to EmbeddedContextFile type
Bootstrap context files (SOUL.md, AGENTS.md, TOOLS.md, etc.) are now tagged with provenance when assembled by buildBootstrapContextFiles()
System prompt injects  HTML comment before each tagged context segment
New "Context Provenance" section in system prompt instructs the agent to re-read volatile content before relying on it
Backward-compatible: provenance is optional, existing code works unchanged

Closes #54373 (Phase 1 of the RFC)

Changed files

src/agents/pi-embedded-helpers/types.ts — extended EmbeddedContextFile with optional provenance
src/agents/pi-embedded-helpers/bootstrap.ts — attach provenance to each bootstrap file
src/agents/system-prompt.ts — render provenance tags + add guidance section
src/agents/system-prompt.test.ts — tests for provenance tag rendering
src/agents/pi-embedded-helpers.buildbootstrapcontextfiles.test.ts — tests for provenance attachment

Test plan

Existing tests pass (pnpm test)
New tests verify provenance tags are rendered in system prompt
New tests verify provenance metadata attached to bootstrap context files
Manual verification: start OpenClaw, inspect system prompt for  tags
Token overhead < 1% of bootstrap budget

🤖 Generated with Claude Code

Changed files

src/agents/pi-embedded-helpers.buildbootstrapcontextfiles.test.ts (modified, +21/-0)
src/agents/pi-embedded-helpers/bootstrap.ts (modified, +10/-0)
src/agents/pi-embedded-helpers/types.ts (modified, +18/-1)
src/agents/system-prompt.test.ts (modified, +59/-0)
src/agents/system-prompt.ts (modified, +25/-0)

Code Example

[source: SOUL.md, injected_at: session_start, volatile: true]
## Knowledge Principles
...

[source: memory_search, query: "auth flow", retrieved_at: turn_3, volatile: false]
The auth flow uses JWT tokens stored in...

[source: read("retriever.py"), turn: 12, volatile: false]
def retrieve(...): ...

RAW_BUFFERClick to expand / collapse

Problem

When PiEmbeddedRunner assembles the system prompt, all injected content (SOUL.md, AGENTS.md, memory search results, skill metadata) enters the context window as flat text. The agent cannot distinguish:

Content injected at session start vs. content freshly read in the current turn
Volatile files (frequently changing) vs. stable configuration
Stale memory search results vs. authoritative tool outputs

This leads to the agent treating potentially outdated information as current, causing incorrect decisions — especially in long-running sessions where injected project docs may have changed on disk.

Prior Art

This is a recognized problem in the LLM agent space, but no framework has implemented an in-context solution:

MemOS (MemCubes) — metadata-rich memory containers with provenance/versioning, but metadata stays external to the context window (arXiv:2507.03724)
OpenAI Agents SDK — memory notes carry last_update_date + recency-wins conflict resolution, but limited to memory, not general context
"Architectures for Building Agentic AI" — prescribes TTL policies, content hashes, typed slots per source (arXiv:2512.09458)
Chroma's "Context Rot" research — demonstrates universal accuracy degradation with growing context across 18 frontier models

None of these implement inline provenance metadata within the token stream that the agent can reason over.

Proposed Solution

Add structured metadata tags to each context segment during prompt assembly in PiEmbeddedRunner:

[source: SOUL.md, injected_at: session_start, volatile: true]
## Knowledge Principles
...

[source: memory_search, query: "auth flow", retrieved_at: turn_3, volatile: false]
The auth flow uses JWT tokens stored in...

[source: read("retriever.py"), turn: 12, volatile: false]
def retrieve(...): ...

Agent-side behavior: System prompt includes a rule: if volatile=true and injected_at=session_start, auto-trigger a re-read before relying on that content.

Implementation Scope (incremental)

Phase 1 — Prompt assembly tagging: PiEmbeddedRunner annotates each injected segment with source, injected_at, volatile
Phase 2 — Memory module: memory_search / memory_get results carry last_modified timestamps
Phase 3 — Agent instruction: Add provenance-aware reasoning rules to the base system prompt
Phase 4 — Orchestration-level auto-refresh: Framework detects stale volatile segments and refreshes before inference (no longer relying on model compliance)

Why OpenClaw is the right place for this

PiEmbeddedRunner already separates context sources (SOUL.md, AGENTS.md, skills, memory) — the injection points are clear
Metadata-only skill injection shows the team already thinks about context efficiency
Context compaction + isolation mechanisms exist — provenance tagging is a natural extension
No existing framework has done this — OpenClaw could set the standard

Open Questions

Should volatility be configured per-file (in openclaw.json) or inferred from file type/change frequency?
How should this interact with context compaction — preserve tags through compaction, or re-tag after?
Token overhead of metadata tags — worth measuring impact on context budget

Happy to contribute a PR for Phase 1 if the approach looks reasonable.

extent analysis

Fix Plan

To address the issue of the agent treating outdated information as current, we will implement inline provenance metadata within the token stream. The solution involves the following steps:

Phase 1 — Prompt assembly tagging: Modify PiEmbeddedRunner to annotate each injected segment with source, injected_at, and volatile metadata tags.
Phase 2 — Memory module: Update memory_search and memory_get results to carry last_modified timestamps.
Phase 3 — Agent instruction: Add provenance-aware reasoning rules to the base system prompt.
Phase 4 — Orchestration-level auto-refresh: Implement framework-level detection of stale volatile segments and refresh before inference.

Example Code

To implement Phase 1, you can modify the PiEmbeddedRunner to include metadata tags in the prompt assembly:

def assemble_prompt(self, context_segments):
    tagged_segments = []
    for segment in context_segments:
        # Determine source and volatility of the segment
        source = self.determine_source(segment)
        volatile = self.is_volatile(segment)
        injected_at = self.get_injection_time(segment)
        
        # Create metadata tag
        metadata_tag = f"[source: {source}, injected_at: {injected_at}, volatile: {volatile}]"
        
        # Append metadata tag to the segment
        tagged_segment = f"{metadata_tag}\n{segment}"
        tagged_segments.append(tagged_segment)
    
    return "\n".join(tagged_segments)

Verification

To verify that the fix worked, you can test the PiEmbeddedRunner with different scenarios, such as:

Injecting new content at session start and verifying that the agent correctly identifies it as fresh.
Updating volatile files and verifying that the agent detects the changes and refreshes the content.
Checking that the agent correctly distinguishes between stale memory search results and authoritative tool outputs.

Extra Tips

Consider configuring volatility per-file in openclaw.json to provide more fine-grained control.
Measure the token overhead of metadata tags to ensure it does not significantly impact the context budget.
Preserve metadata tags through context compaction to maintain provenance information.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #latency issue #model loading #dependency error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix [RFC] Context Provenance: Add source/volatility metadata to injected context segments [1 pull requests, 3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #54830: feat: context provenance metadata for injected bootstrap segments

Description (problem / solution / changelog)

Summary

Changed files

Test plan

Changed files

Code Example

Problem

Prior Art

Proposed Solution

Implementation Scope (incremental)

Why OpenClaw is the right place for this

Open Questions

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix [RFC] Context Provenance: Add source/volatility metadata to injected context segments [1 pull requests, 3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #54830: feat: context provenance metadata for injected bootstrap segments

Description (problem / solution / changelog)

Summary

Changed files

Test plan

Changed files

Code Example

Problem

Prior Art

Proposed Solution

Implementation Scope (incremental)

Why OpenClaw is the right place for this

Open Questions

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING