openclaw - 💡(How to fix) Fix 梦境主题提取过滤元数据字段名 [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70063Fetched 2026-04-23 07:29:43
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

Code Example

Theme: `user` kept surfacing across 415 memories.
Theme: `assistant` kept surfacing across 522 memories.

---

会话语料 (session-corpus/*.txt)
  → 包含元数据: "User: Sender (untrusted metadata): ..."
  → deriveConceptTags() 提取概念标签
  → "user", "assistant", "sender" 不在停用词表中
  → 被标记为 conceptTags
  → buildRemReflections() 统计频率
  → 输出无意义的主题统计

---

const CONCEPT_STOP_WORDS = new Set(Object.values({
  shared: [
    // 现有停用词...
    "about", "after", "agent", "also", ...
    
    // 新增:元数据相关词汇
    "user",
    "assistant", 
    "sender",
    "receiver",
    "metadata",
    "untrusted",
    "trusted",
    "context",
    "message",
    "timestamp",
    "session",
    "conversation",
    "label",
    "id",
    "json",
    "token",
    "uuid",
    "openclaw",
    "runtime",
    "internal",
    "system"
  ],
  // ...
}));

---

function deriveConceptTags(params) {
  const source = `${path.basename(params.path)} ${params.snippet}`;
  const limit = Number.isFinite(params.limit) ? Math.max(0, Math.floor(params.limit)) : 8;
  if (limit === 0) return [];
  
  const tags = [];
  
  // 新增:过滤元数据模式
  const METADATA_PATTERNS = [
    /^(user|assistant|sender|receiver|metadata|untrusted|trusted|context|message|timestamp|session|conversation|label|json|token|uuid|openclaw|runtime|internal|system)$/i,
    /^(ou_|om_|cli_)[a-z0-9]+$/i,  // 飞书ID模式
    /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/i,  // ISO时间戳
  ];
  
  function isMetadataToken(token) {
    return METADATA_PATTERNS.some(p => p.test(token));
  }
  
  for (const rawToken of [
    ...collectGlossaryMatches(source),
    ...collectCompoundTokens(source),
    ...collectSegmentTokens(source)
  ]) {
    // 新增:跳过元数据模式
    if (isMetadataToken(rawToken)) continue;
    
    pushNormalizedTag(tags, rawToken, limit);
    if (tags.length >= limit) break;
  }
  return tags;
}

---

Theme: `user` kept surfacing across 415 memories.
Theme: `assistant` kept surfacing across 522 memories.

---

Theme: `Graphify` kept surfacing across 12 memories.
Theme: `协议嵌入` kept surfacing across 8 memories.
Theme: `模型配置` kept surfacing across 6 memories.
RAW_BUFFERClick to expand / collapse

OpenClaw梦境主题过滤优化 - 功能请求

问题描述

梦境功能(dreaming)在提取主题时,将会话元数据字段名(如userassistantsender)识别为"主题",导致输出无意义的统计:

Theme: `user` kept surfacing across 415 memories.
Theme: `assistant` kept surfacing across 522 memories.

这些是会话元数据字段名,不是任务内容主题。

根本原因

代码位置

  • 概念标签提取:short-term-promotion-Cd3cMDbx.js:336deriveConceptTags()
  • 停用词表:short-term-promotion-Cd3cMDbx.js:12CONCEPT_STOP_WORDS

问题链路

会话语料 (session-corpus/*.txt)
  → 包含元数据: "User: Sender (untrusted metadata): ..."
  → deriveConceptTags() 提取概念标签
  → "user", "assistant", "sender" 不在停用词表中
  → 被标记为 conceptTags
  → buildRemReflections() 统计频率
  → 输出无意义的主题统计

建议解决方案

方案一:扩展停用词表

CONCEPT_STOP_WORDS 中添加元数据相关词汇:

const CONCEPT_STOP_WORDS = new Set(Object.values({
  shared: [
    // 现有停用词...
    "about", "after", "agent", "also", ...
    
    // 新增:元数据相关词汇
    "user",
    "assistant", 
    "sender",
    "receiver",
    "metadata",
    "untrusted",
    "trusted",
    "context",
    "message",
    "timestamp",
    "session",
    "conversation",
    "label",
    "id",
    "json",
    "token",
    "uuid",
    "openclaw",
    "runtime",
    "internal",
    "system"
  ],
  // ...
}));

方案二:过滤元数据字段名模式(推荐)

deriveConceptTags() 函数中添加元数据模式过滤:

function deriveConceptTags(params) {
  const source = `${path.basename(params.path)} ${params.snippet}`;
  const limit = Number.isFinite(params.limit) ? Math.max(0, Math.floor(params.limit)) : 8;
  if (limit === 0) return [];
  
  const tags = [];
  
  // 新增:过滤元数据模式
  const METADATA_PATTERNS = [
    /^(user|assistant|sender|receiver|metadata|untrusted|trusted|context|message|timestamp|session|conversation|label|json|token|uuid|openclaw|runtime|internal|system)$/i,
    /^(ou_|om_|cli_)[a-z0-9]+$/i,  // 飞书ID模式
    /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/i,  // ISO时间戳
  ];
  
  function isMetadataToken(token) {
    return METADATA_PATTERNS.some(p => p.test(token));
  }
  
  for (const rawToken of [
    ...collectGlossaryMatches(source),
    ...collectCompoundTokens(source),
    ...collectSegmentTokens(source)
  ]) {
    // 新增:跳过元数据模式
    if (isMetadataToken(rawToken)) continue;
    
    pushNormalizedTag(tags, rawToken, limit);
    if (tags.length >= limit) break;
  }
  return tags;
}

预期效果

优化前

Theme: `user` kept surfacing across 415 memories.
Theme: `assistant` kept surfacing across 522 memories.

优化后

Theme: `Graphify` kept surfacing across 12 memories.
Theme: `协议嵌入` kept surfacing across 8 memories.
Theme: `模型配置` kept surfacing across 6 memories.

影响范围

  • 仅影响梦境主题提取逻辑
  • 不影响其他记忆功能
  • 向后兼容,无破坏性变更

优先级

中等 - 影响梦境输出质量,但不影响核心功能


提交日期:2026-04-22 提交人:R先生 环境:OpenClaw 2026.4.15, Windows 10

extent analysis

TL;DR

To fix the issue of metadata field names being incorrectly identified as themes, update the CONCEPT_STOP_WORDS set or add metadata pattern filtering in the deriveConceptTags() function.

Guidance

  • Update the CONCEPT_STOP_WORDS set to include metadata-related words, such as "user", "assistant", and "sender", to prevent them from being extracted as concept tags.
  • Add metadata pattern filtering in the deriveConceptTags() function to skip tokens that match specific metadata patterns, such as those starting with "user" or "assistant".
  • Verify the fix by checking the output of the theme extraction feature to ensure that metadata field names are no longer being incorrectly identified as themes.
  • Consider testing the updated code with a variety of input data to ensure that the fix does not introduce any new issues.

Example

const METADATA_PATTERNS = [
  /^(user|assistant|sender|receiver|metadata|untrusted|trusted|context|message|timestamp|session|conversation|label|json|token|uuid|openclaw|runtime|internal|system)$/i,
  /^(ou_|om_|cli_)[a-z0-9]+$/i,  // 飞书ID模式
  /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}/i,  // ISO时间戳
];

function isMetadataToken(token) {
  return METADATA_PATTERNS.some(p => p.test(token));
}

Notes

The provided solutions assume that the issue is caused by the metadata field names being incorrectly extracted as concept tags. If the issue is more complex, additional debugging and testing may be necessary.

Recommendation

Apply the workaround of adding metadata pattern filtering in the deriveConceptTags() function, as it provides a more targeted solution to the issue and does not require modifying the CONCEPT_STOP_WORDS set.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix 梦境主题提取过滤元数据字段名 [1 participants]