April 22, 2026 • 版本: v0.12.0

[关联遍历、显著性加权与基于访问的遗忘] - Memory v2 Enhancement Guide: Associative Traversal, Salience Weighting, and Access-Based Forgetting

架构指南,介绍如何扩展 OpenClaw 的 Memory v2,包括实体共现遍历、显著性加权保留和基于访问的衰减,以提高长时间运行的代理部署中的检索精度。

🔍 症状

当前 Memory v2 检索限制

长时间运行(数天到数周)的代理在使用现有检索机制时表现出上下文连贯性下降。以下症状在生产环境中出现:

症状 1:浅层词汇检索

跨时间查询概念相关信息时,代理仅检索到表面匹配:

$ openclaw memory recall "app performance improvements"
---
RETRIEVED FACTS (3):
- W(s=0.3) @config: Updated heartbeat interval from 5m to 30m.
- W(s=0.3) @config: Increased worker pool size to 4.
- W(s=0.3) @api: Added rate limiting middleware.

EXPECTED: Connection to Week 2 debugging session about slow database queries
ACTUAL: Generic config changes only

代理无法遍历隐式链:“performance” → “slow endpoint” → “database query” → “Sarah’s expertise.”

症状 2:不同记忆的等权重

所有存储的事实无论重要性如何,都平等地竞争上下文预算:

$ openclaw memory recall "any recent updates"
---
RETRIEVED (k=10, context budget: 4KB):

1. W(s=0.3) @config: Updated heartbeat interval from 5m to 30m.
2. W(s=0.3) @config: Increased worker pool size to 4.
3. W(s=0.3) @config: Set log level to INFO.
4. W(s=0.3) @config: Disabled telemetry opt-in.
5. B(s=0.3) @Sarah @project: Sarah announced she's leaving next month.
6. B(s=0.3) @user @identity: User prefers morning standups.
...

CRITICAL GAP: No salience differentiation. Sarah's departure competes equally with log level changes.

症状 3:无衰减的无限索引增长

持续运行 30+ 天后:

$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM facts;"
487

$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM facts WHERE last_accessed > datetime('now', '-7 days');"
12

只有 2.5% 的事实在过去一周被访问,但全部 487 条在检索评分中竞争。reflect 任务必须处理不断增长的集合,但没有优先级信号。

症状 4:枢纽节点污染(参考 CLS-M 基准)

出现在许多事实中的实体吸收了检索激活:

$ sqlite3 ~/.openclaw/memory.db "SELECT entity, COUNT(*) as cnt FROM fact_entities GROUP BY entity ORDER BY cnt DESC LIMIT 5;"
entity|cnt
@Peter|203
@heartbeat|57
@api|89
@config|112
@system|78

通过 @Peter(203 条事实)直接遍历实体,稀释了特定相关连接的信号。

🧠 根因分析

当前 Memory v2 设计中的架构缺陷

当前检索系统缺少三个关键机制,这些机制对于在长时间运行部署中保持精确性至关重要:

缺陷 1:单跳实体检索

现有实体感知检索模型返回直接标记了查询实体的知识,但不会递归遍历共现实体

-- Current query (single-hop)
SELECT f.content, f.salience 
FROM facts f
JOIN fact_entities fe ON f.id = fe.fact_id
JOIN entities e ON fe.entity_id = e.id
WHERE e.name = 'performance';

-- Returns only: facts explicitly tagged @performance
-- Misses: facts about @database that co-occur with @performance across the corpus

这对于精确实体查询(“告诉我关于 X 的信息”)在架构上是正确的,但对于探索性查询(代理发现隐式连接)来说是不够的。

缺陷 2:保留时缺少显著性追踪

Letta 控制循环的基本洞察是拥有体验的代理必须决定保留什么。然而,由于 retain 调用没有 salience 参数,这个决定是二进制的(保留/丢弃)而不是渐进的:

-- Current (binary)
openclaw memory retain "Sarah is leaving the company next month"

-- Missing salience metadata that would distinguish:
-- A config file tweak (s=0.2)
-- A critical team change (s=0.95)

没有显著性,reflect 任务无法区分信号和噪声——它必须通过最近访问或访问频率来代理重要性,而这些都是实际重要性的糟糕代理。

缺陷 3:没有基于访问的衰减机制

当前设计将所有历史知识视为同等可检索的,无论参与模式如何:

-- No temporal or access-based scoring
SELECT content FROM facts 
ORDER BY created_at DESC  -- Only recency, not relevance
LIMIT 10;

这造成了三个级联问题:

  1. 精确性下降:随着索引增长,相关与不相关知识的比例降低
  2. Reflect 任务效率低下:反射处理器必须评估越来越大的语料库,没有优先级
  3. 枢纽噪声放大:出现在 100+ 事实中的高阶实体在遍历中占主导地位,没有衰减

CLS-M 原型的根因分析

CLS-M 原型(132 个节点,802 条边)通过实证验证了这些缺陷:

  • 召回率可接受(65%)精确率很差(35%)——这意味着 65% 的检索内容是噪声
  • 枢纽节点破坏了精确率:heartbeat 节点有 57 条边,吸收了本应流向特定节点的激活
  • 基于时间的衰减失败:一个 3 个月前但每周被访问的知识应该保持突出;年龄本身不是相关性信号

修复方案不是构建独立的知识图谱,而是通过以下方式扩展现有 SQLite 索引:

  1. 通过逆文档频率(IDF)加权进行实体共现追踪
  2. 显著性作为 retain 操作的一等参数
  3. 基于访问的衰减(在检索时重置,而不是纯基于年龄的衰减)

🛠️ 逐步修复

阶段 1:SQLite 索引的架构扩展

向现有架构添加显著性和访问追踪列:

-- Migration: add_salience_and_access_tracking.sql

-- 1. Add salience column (0.0 to 1.0, default 0.5)
ALTER TABLE facts ADD COLUMN salience REAL DEFAULT 0.5;

-- 2. Add access tracking columns
ALTER TABLE facts ADD COLUMN last_accessed_at DATETIME DEFAULT NULL;
ALTER TABLE facts ADD COLUMN access_count INTEGER DEFAULT 0;

-- 3. Create index for access-based queries
CREATE INDEX idx_facts_last_accessed ON facts(last_accessed_at);
CREATE INDEX idx_facts_salience ON facts(salience);

-- 4. Precompute entity frequencies for IDF weighting
CREATE TABLE entity_stats AS
SELECT 
    e.id,
    e.name,
    COUNT(fe.fact_id) as fact_count,
    1.0 / LOG(COUNT(fe.fact_id) + 1) as idf_weight
FROM entities e
LEFT JOIN fact_entities fe ON e.id = fe.entity_id
GROUP BY e.id;

CREATE INDEX idx_entity_stats_fact_count ON entity_stats(fact_count);

阶段 2:实体共现表

从现有事实索引构建共现矩阵:

-- Migration: build_entity_cooccurrence.sql

-- 1. Create co-occurrence table
CREATE TABLE entity_cooccurrence (
    entity_id_1 INTEGER NOT NULL,
    entity_id_2 INTEGER NOT NULL,
    cooccur_count INTEGER DEFAULT 1,
    cooccur_weight REAL DEFAULT 0.0,
    PRIMARY KEY (entity_id_1, entity_id_2),
    FOREIGN KEY (entity_id_1) REFERENCES entities(id),
    FOREIGN KEY (entity_id_2) REFERENCES entities(id)
);

-- 2. Populate from existing fact_entities (facts with 2+ entities)
INSERT INTO entity_cooccurrence (entity_id_1, entity_id_2, cooccur_count)
SELECT 
    fe1.entity_id,
    fe2.entity_id,
    COUNT(DISTINCT fe1.fact_id)
FROM fact_entities fe1
JOIN fact_entities fe2 ON fe1.fact_id = fe2.fact_id
WHERE fe1.entity_id < fe2.entity_id  -- Avoid duplicates
GROUP BY fe1.entity_id, fe2.entity_id;

-- 3. Compute weighted co-occurrence using IDF
UPDATE entity_cooccurrence SET cooccur_weight = (
    SELECT 
        CAST(cooccur_count AS REAL) * 
        (SELECT idf_weight FROM entity_stats WHERE idf_weight = entity_id_1) *
        (SELECT idf_weight FROM entity_stats WHERE entity_stats.id = entity_id_2)
    WHERE entity_cooccurrence.entity_id_1 = entity_id_1 
    AND entity_cooccurrence.entity_id_2 = entity_id_2
);

-- 4. Create index for fast co-occurrence lookups
CREATE INDEX idx_cooccur_lookup ON entity_cooccurrence(entity_id_1, cooccur_weight DESC);

阶段 3:CLI 命令更新

用显著性参数扩展 retain 命令:

# Before
openclaw memory retain "Sarah is leaving the company next month"

# After (with salience)
openclaw memory retain "Sarah is leaving the company next month" \
  --type B \
  --entity Sarah \
  --entity project \
  --salience 0.95

用显著性过滤和关联遍历扩展 recall 命令:

# Before
openclaw memory recall "performance improvements"

# After (with enhanced options)
openclaw memory recall "performance improvements" \
  --k 10 \
  --min-salience 0.3 \
  --associative-depth 2 \
  --activation-decay 0.5

阶段 4:关联遍历算法

实现深度限制和激活衰减的遍历:

def associative_traverse(seed_entities: list[str], depth: int = 2, decay: float = 0.5) -> dict:
    """
    Traverse entity co-occurrence graph with depth limiting and activation decay.
    
    Returns:
        dict: {entity_name: accumulated_activation_score}
    """
    activation = {}
    visited = set()
    
    # Initialize seed entities with full activation
    for entity_name in seed_entities:
        activation[entity_name] = 1.0
        visited.add(entity_name)
    
    current_entities = seed_entities
    current_activation = 1.0
    
    for hop in range(depth):
        next_entities = []
        next_activation = current_activation * decay
        
        for entity_name in current_entities:
            # Query co-occurring entities with IDF weighting
            cooccurring = query("""
                SELECT e.name, c.cooccur_weight, es.idf_weight
                FROM entity_cooccurrence c
                JOIN entities e ON c.entity_id_2 = e.id
                JOIN entity_stats es ON e.id = es.id
                WHERE c.entity_id_1 = (
                    SELECT id FROM entities WHERE name = ?
                )
                AND e.name NOT IN ({}),
                ORDER BY c.cooccur_weight * es.idf_weight DESC
                LIMIT 10
            """, entity_name)
            
            for coentity_name, cooccur_weight, idf_weight in cooccurring:
                if coentity_name not in visited:
                    contribution = next_activation * cooccur_weight * idf_weight
                    activation[coentity_name] = activation.get(coentity_name, 0) + contribution
                    next_entities.append(coentity_name)
                    visited.add(coentity_name)
        
        current_entities = next_entities
        current_activation = next_activation
    
    return activation

阶段 5:基于访问的衰减实现

对检索分数实现幂律衰减:

def compute_retrieval_score(fact: dict, query_entities: list[str], 
                            now: datetime = None) -> float:
    """
    Compute composite retrieval score including salience and access-based decay.
    
    Components:
    - Base match score (lexical/semantic/associative)
    - Salience weight (from retain call)
    - Access decay (power-law, reset on retrieval)
    """
    if now is None:
        now = datetime.utcnow()
    
    base_score = compute_base_match_score(fact, query_entities)
    salience_score = fact.get('salience', 0.5)
    
    # Access-based decay (power-law, halves every 7 days)
    last_accessed = fact.get('last_accessed_at')
    if last_accessed:
        days_since_access = (now - last_accessed).days
        access_decay = 0.5 ** (days_since_access / 7.0)
    else:
        access_decay = 0.25  # Never-accessed facts start quieter
    
    # Boost for frequent access (logarithmic to prevent hub dominance)
    access_count = fact.get('access_count', 0)
    access_boost = 1.0 + (0.1 * math.log1p(access_count))
    
    composite_score = (
        base_score * 0.4 +
        salience_score * 0.35 +
        access_decay * access_boost * 0.25
    )
    
    return composite_score

def on_fact_retrieved(fact_id: int) -> None:
    """Update access tracking when a fact is retrieved."""
    execute("""
        UPDATE facts 
        SET last_accessed_at = ?,
            access_count = access_count + 1
        WHERE id = ?
    """, (datetime.utcnow(), fact_id))

阶段 6:Reflect 循环集成

更新 reflect 任务以优先处理最近访问的知识:

# In reflect job processor
def reflect_on_memories(agent_id: str, core_memory_max_tokens: int = 2048) -> None:
    # Query recently-accessed facts weighted by salience
    recent_facts = query("""
        SELECT f.*, 
               COALESCE(f.salience, 0.5) * 
               (1.0 + 0.1 * LOG1P(COALESCE(f.access_count, 0))) as priority_score
        FROM facts f
        WHERE f.agent_id = ?
        AND (
            f.last_accessed_at > datetime('now', '-30 days')
            OR f.salience > 0.8
        )
        ORDER BY priority_score DESC, f.last_accessed_at DESC
        LIMIT 100
    """, agent_id)
    
    # Existing reflect logic operates on priority-filtered set
    consolidated = consolidate_memories(recent_facts)
    update_core_memory(consolidated, max_tokens=core_memory_max_tokens)

🧪 验证

验证测试套件

执行以下命令以验证每个增强功能:

测试 1:架构迁移

$ sqlite3 ~/.openclaw/memory.db ".schema facts"
--- Expected output ---
CREATE TABLE facts (
    ...
    salience REAL DEFAULT 0.5,
    last_accessed_at DATETIME,
    access_count INTEGER DEFAULT 0
);

$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM entity_cooccurrence;"
--- Expected output ---
> 0 (before population) or > 100 (after population with populated index)

测试 2:显著性感知 Retain 和 Recall

# Retain with salience
$ openclaw memory retain "Sarah is leaving the company next month" \
  --type B \
  --entity Sarah \
  --entity project \
  --salience 0.95

--- Expected output ---
✓ Retained: B(s=0.95) @Sarah @project: Sarah is leaving...

# Verify in database
$ sqlite3 ~/.openclaw/memory.db \
  "SELECT content, salience FROM facts WHERE content LIKE '%Sarah%';"
--- Expected output ---
Sarah is leaving the company next month|0.95

测试 3:访问追踪

# Query a fact (simulated)
$ openclaw memory recall "heartbeat configuration"

# Verify access tracking updated
$ sqlite3 ~/.openclaw/memory.db \
  "SELECT content, last_accessed_at, access_count FROM facts ORDER BY access_count DESC LIMIT 3;"
--- Expected output ---
Updated heartbeat interval from 5m to 30m.|2025-01-15 10:30:00|5
Increased worker pool size to 4.|2025-01-15 09:15:00|3
Rate limiting middleware added.|2025-01-14 14:22:00|1

测试 4:关联遍历查询

# Query with associative depth
$ openclaw memory recall "app performance" \
  --associative-depth 2 \
  --min-salience 0.3

--- Expected output ---
RETRIEVED (associative, depth=2):

Direct matches:
- W(s=0.2) @config: Updated heartbeat interval from 5m to 30m.

2-hop connections:
- B(s=0.95) @Sarah @project: Sarah is leaving... (via @database → @slow-endpoint)
- W(s=0.3) @api: Rate limiting middleware added. (via @slow-endpoint)

# Verify traversal path in debug mode
$ openclaw memory recall "app performance" --associative-depth 2 --debug
--- Expected output ---
Traversal: performance → {database, slow-endpoint, api} 
           → database → {Sarah, PostgreSQL, indexing}
           → Final activation: {Sarah: 0.42, indexing: 0.31, ...}

测试 5:复合评分验证

$ python3 -c "
from openclaw.memory.scoring import compute_retrieval_score
import datetime

test_fact = {
    'content': 'Sarah is leaving next month',
    'salience': 0.95,
    'last_accessed_at': datetime.datetime.now() - datetime.timedelta(days=2),
    'access_count': 5
}

score = compute_retrieval_score(test_fact, query_entities=['personnel'])
print(f'Composite score: {score:.3f}')
print(f'  - Salience contribution: {0.95 * 0.35:.3f}')
print(f'  - Access decay (2 days): {0.5 ** (2/7) * 1.15 * 0.25:.3f}')
"
--- Expected output ---
Composite score: 0.573
  - Salience contribution: 0.333
  - Access decay (2 days): 0.240

测试 6:Reflect 任务优先级

# Run reflect with debug output
$ openclaw memory reflect --agent-id test-agent --debug

--- Expected output ---
Processing 47 facts (filtered from 487 total by priority)
Top priority facts:
1. B(s=0.95) @Sarah @project: Sarah is leaving... (priority: 1.23)
2. B(s=0.9) @user @identity: User prefers morning standups... (priority: 1.19)
3. W(s=0.8) @Peter @deadline: Q1 deadline is March 15... (priority: 1.08)

Core memory updated: 1,847 tokens (was 2,103)

⚠️ 常见陷阱

实现陷阱和环境特定注意事项

陷阱 1:没有 IDF 加权的枢纽节点占主导

**症状:**关联遍历返回的结果几乎相同,不管查询如何——高阶实体(Peter、config、system)主导所有路径。

**原因:**没有逆实体频率加权的原始共现计数。

**修复:**确保在所有共现查询中应用 entity_stats.idf_weight = 1 / log(entity_fact_count) 公式:

-- Wrong (hub dominance)
SELECT e.name FROM entities e
JOIN fact_entities fe ON e.id = fe.entity_id
WHERE fe.fact_id IN (
    SELECT fact_id FROM fact_entities WHERE entity_id = ?
)
ORDER BY COUNT(*) DESC

-- Correct (IDF-weighted)
SELECT e.name FROM entities e
JOIN entity_stats es ON e.id = es.id
JOIN fact_entities fe ON e.id = fe.entity_id
WHERE fe.fact_id IN (
    SELECT fact_id FROM fact_entities WHERE entity_id = ?
)
ORDER BY es.idf_weight * COUNT(*) DESC

陷阱 2:混淆基于时间和基于访问的衰减

**症状:**旧但频繁访问的知识得分低;新鲜但从未访问的知识得分高。

**原因:**单独使用 last_accessed_at 年龄,而不是带增强的基于访问的衰减。

**规则:**基于访问的衰减(在检索时重置)优于基于时间的衰减。每周访问一次的 3 个月前的知识应该排名高于 1 天前但从未访问的知识:

# Wrong: Pure age decay
score = salience * (0.5 ** (age_in_days / 30))

# Correct: Access-based decay with boost
access_decay = 0.5 ** (days_since_last_access / 7)  # Halves every 7 days
access_boost = 1.0 + (0.1 * log1p(access_count))     # Logarithmic, prevents hub dominance
score = salience * access_decay * access_boost

陷阱 3:关联深度过深

**症状:**检索延迟超过 500ms;输出包含看似随机的知识。

**原因:**深度 > 3 且没有激活截止,洪水般淹没遍历。

**修复:**实现深度限制和最小激活阈值:

MAX_DEPTH = 3
MIN_ACTIVATION = 0.05
INITIAL_ACTIVATION = 1.0
DECAY_PER_HOP = 0.5

# Traversal stops when:
# - Depth limit reached, OR
# - No entities exceed MIN_ACTIVATION threshold

陷阱 4:Retain 时显著性估计失败

**症状:**所有知识获得相似的显著性分数(0.4-0.6);差异化丢失。

**原因:**LLM 估计过于保守;默认为中间值。

**修复:**使用显式锚点实现基于提示的显著性估计:

SYSTEM_PROMPT = """
Estimate salience (0.0-1.0) for this memory:
- 0.9-1.0: Identity-defining, relationship-changing, career-affecting
- 0.7-0.9: Important project decisions, team changes, deadlines
- 0.4-0.7: Routine work, configurations, bug fixes
- 0.1-0.4: Minor preferences, temp states, easily reconstructed

Memory: {fact_content}

Respond ONLY with a number between 0.0 and 1.0.
"""

始终允许通过 --salience CLI 标志或直接文件编辑进行人工覆盖。

陷阱 5:Docker/容器环境权限

**症状:**在 Docker 中运行时出现 sqlite3: unable to open database file

**原因:**SQLite 数据库以错误权限或路径挂载到卷。

**修复:**确保卷挂载保留目录结构:

# Wrong
docker run -v /host/memory:/container/memory image

# Correct (bind mount the parent directory)
docker run -v /host/.openclaw:/root/.openclaw image

# Verify permissions
docker exec container ls -la /root/.openclaw/memory.db
# Should show: -rw-r--r-- 1 root root ...

陷阱 6:Raspberry Pi 5 资源限制

**症状:**关联遍历在 ARM 设备上导致内存压力。

**原因:**用于激活追踪的 Python 字典 + 递归查询超出可用 RAM。

**修复:**限制遍历范围并使用游标迭代:

# Limit activation dict size
MAX_ACTIVATION_ENTITIES = 50

# Use generator for memory efficiency
def associative_traverse_stream(seed, depth, decay):
    frontier = {seed: 1.0}
    visited = {seed}
    
    for _ in range(depth):
        next_frontier = {}
        for entity, activation in frontier.items():
            if activation < MIN_ACTIVATION:
                continue
            for coentity in fetch_cooccurring(entity, limit=5):
                if coentity not in visited:
                    next_frontier[coentity] = next_frontier.get(coentity, 0) + \
                        activation * decay
                    visited.add(coentity)
        frontier = next_frontier
        yield from frontier.items()

🔗 相关错误

上下文相关问题和历史参考

相关设计文档

  • Workspace Memory v2 Research Doc — 本指南扩展的基线架构。关键章节:"Entity-Aware Retrieval," "Incremental Indexing," "Reflect Loop"
  • Hindsight × Letta Integration — 带置信度的类型化事实为显著性加权提供了底层基础
  • CLS-M Prototype Analysis — 实证验证(132 个节点,802 条边,F1=44%)证明朴素传播激活存在精确性挑战

内存系统中的常见错误代码

错误代码描述相关问题
E2BIG组装的上下文超出令牌预算;reflect 任务无法压缩显著性加权,访问衰减
ENOENTITY实体查询返回空但语义搜索找到结果实体提取差距,FTS 回退
EDUPFACTS累积了近似重复的知识而没有整合Reflect 循环限制
EHUBNODES检索被高频实体(Peter、system、config)主导缺少 IDF 加权
ECOLDSTART新部署没有足够的知识密度进行关联遍历实体共现密度阈值
EDECAYTOOFAST基于时间的衰减过早擦除有用的旧记忆基于访问与基于时间的衰减

CLS-M 的历史背景

CLS-M 原型识别出的失败模式为这些建议提供了依据:

  • 45 查询基准上 F1=44% — 精确率(35%)是瓶颈,而不是召回率(65%)
  • 枢纽噪声杀死:heartbeat 节点有 57 条边,在每次查询时吸收了 15% 的总激活
  • 委托失败:子代理记忆提取持续失败;体验代理必须拥有保留权
  • 分散太薄:跨 800+ 条边的激活将信号稀释到有用阈值以下

这些发现验证了渐进式方法:从 FTS5 开始,添加嵌入,然后在达到足够索引密度后才添加实体共现。

OpenClaw 版本兼容性

版本必需功能迁移路径
v0.11.x基本事实存储,FTS5应用阶段 1-2 迁移
v0.12.0实体提取,显著性字段增量应用阶段 1-6
v0.13.0(计划中)关联遍历,访问追踪完整实现

依据与来源

本故障排除指南由 FixClaw 智能管线从社区讨论中自动合成。