Memory v2 Enhancement Guide: Associative Traversal, Salience Weighting, and Access-Based Forgetting
Architectural guide for extending OpenClaw's Memory v2 with entity co-occurrence traversal, salience-weighted retention, and access-based decay to improve retrieval precision in long-running agent deployments.
π Symptoms
Current Memory v2 Retrieval Limitations
Agents running for extended periods (days to weeks) exhibit degraded contextual coherence when using existing retrieval mechanisms. The following symptoms manifest in production deployments:
Symptom 1: Shallow Lexical Retrieval
When querying for conceptually-related information across time, the agent retrieves only surface-level matches:
$ openclaw memory recall "app performance improvements"
---
RETRIEVED FACTS (3):
- W(s=0.3) @config: Updated heartbeat interval from 5m to 30m.
- W(s=0.3) @config: Increased worker pool size to 4.
- W(s=0.3) @api: Added rate limiting middleware.
EXPECTED: Connection to Week 2 debugging session about slow database queries
ACTUAL: Generic config changes onlyThe agent cannot traverse the implicit chain: “performance” β “slow endpoint” β “database query” β “Sarah’s expertise.”
Symptom 2: Equal Weighting of Disparate Memories
All stored facts compete equally for context budget regardless of significance:
$ openclaw memory recall "any recent updates"
---
RETRIEVED (k=10, context budget: 4KB):
1. W(s=0.3) @config: Updated heartbeat interval from 5m to 30m.
2. W(s=0.3) @config: Increased worker pool size to 4.
3. W(s=0.3) @config: Set log level to INFO.
4. W(s=0.3) @config: Disabled telemetry opt-in.
5. B(s=0.3) @Sarah @project: Sarah announced she's leaving next month.
6. B(s=0.3) @user @identity: User prefers morning standups.
...
CRITICAL GAP: No salience differentiation. Sarah's departure competes equally with log level changes.Symptom 3: Unbounded Index Growth Without Decay
After 30+ days of continuous operation:
$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM facts;"
487
$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM facts WHERE last_accessed > datetime('now', '-7 days');"
12Only 2.5% of facts were accessed in the past week, yet all 487 compete in retrieval scoring. The reflect job must process an ever-growing set with no prioritization signal.
Symptom 4: Hub Node Pollution (Reference from CLS-M Benchmark)
Entities appearing across many facts absorb retrieval activation:
$ sqlite3 ~/.openclaw/memory.db "SELECT entity, COUNT(*) as cnt FROM fact_entities GROUP BY entity ORDER BY cnt DESC LIMIT 5;"
entity|cnt
@Peter|203
@heartbeat|57
@api|89
@config|112
@system|78Direct entity traversal through @Peter (203 facts) dilutes signal for specific, relevant connections.
π§ Root Cause
Architectural Gaps in Current Memory v2 Design
The current retrieval system lacks three critical mechanisms that are essential for maintaining precision in long-running deployments:
Gap 1: Single-Hop Entity Retrieval
The existing entity-aware retrieval model returns facts directly tagged with the query entity but does not recursively traverse co-occurring entities:
-- Current query (single-hop)
SELECT f.content, f.salience
FROM facts f
JOIN fact_entities fe ON f.id = fe.fact_id
JOIN entities e ON fe.entity_id = e.id
WHERE e.name = 'performance';
-- Returns only: facts explicitly tagged @performance
-- Misses: facts about @database that co-occur with @performance across the corpusThis is architecturally correct for exact entity lookup (“tell me about X”) but insufficient for exploratory queries where the agent discovers implicit connections.
Gap 2: Absence of Salience Tracking at Retain Time
The Letta control loop’s fundamental insight is that the agent that has the experience must decide what to retain. However, without a salience parameter on retain calls, this decision is binary (keep/discard) rather than graduated:
-- Current (binary)
openclaw memory retain "Sarah is leaving the company next month"
-- Missing salience metadata that would distinguish:
-- A config file tweak (s=0.2)
-- A critical team change (s=0.95)Without salience, the reflect job cannot distinguish signal from noiseβit must proxy importance via recency or access frequency, which are poor proxies for actual significance.
Gap 3: No Access-Based Decay Mechanism
The current design treats all historical facts as equally retrievable regardless of engagement patterns:
-- No temporal or access-based scoring
SELECT content FROM facts
ORDER BY created_at DESC -- Only recency, not relevance
LIMIT 10;This creates three cascading problems:
- Precision degradation: As the index grows, the ratio of relevant-to-irrelevant facts decreases
- Reflect job inefficiency: The reflection processor must evaluate an ever-larger corpus with no prioritization
- Hub noise amplification: High-degree entities (appearing in 100+ facts) dominate traversal without decay
Root Cause Analysis from CLS-M Prototype
The CLS-M prototype (132 nodes, 802 edges) validated these gaps empirically:
- Recall was acceptable (65%) but precision was poor (35%)βmeaning 65% of retrieved content was noise
- Hub nodes destroyed precision: The
heartbeatnode had 57 edges, absorbing activation that should have gone to specific nodes - Time-based decay failed: A fact from 3 months ago that is accessed weekly should remain prominent; age alone is not a relevance signal
The fix is not to build a separate knowledge graph but to extend the existing SQLite index with:
- Entity co-occurrence tracking via inverse document frequency (IDF) weighting
- Salience as a first-class parameter on retain operations
- Access-based decay that resets on retrieval (not pure age-based decay)
π οΈ Step-by-Step Fix
Phase 1: Schema Extensions for SQLite Index
Add salience and access tracking columns to the existing schema:
-- Migration: add_salience_and_access_tracking.sql
-- 1. Add salience column (0.0 to 1.0, default 0.5)
ALTER TABLE facts ADD COLUMN salience REAL DEFAULT 0.5;
-- 2. Add access tracking columns
ALTER TABLE facts ADD COLUMN last_accessed_at DATETIME DEFAULT NULL;
ALTER TABLE facts ADD COLUMN access_count INTEGER DEFAULT 0;
-- 3. Create index for access-based queries
CREATE INDEX idx_facts_last_accessed ON facts(last_accessed_at);
CREATE INDEX idx_facts_salience ON facts(salience);
-- 4. Precompute entity frequencies for IDF weighting
CREATE TABLE entity_stats AS
SELECT
e.id,
e.name,
COUNT(fe.fact_id) as fact_count,
1.0 / LOG(COUNT(fe.fact_id) + 1) as idf_weight
FROM entities e
LEFT JOIN fact_entities fe ON e.id = fe.entity_id
GROUP BY e.id;
CREATE INDEX idx_entity_stats_fact_count ON entity_stats(fact_count);Phase 2: Entity Co-Occurrence Table
Build co-occurrence matrix from existing fact index:
-- Migration: build_entity_cooccurrence.sql
-- 1. Create co-occurrence table
CREATE TABLE entity_cooccurrence (
entity_id_1 INTEGER NOT NULL,
entity_id_2 INTEGER NOT NULL,
cooccur_count INTEGER DEFAULT 1,
cooccur_weight REAL DEFAULT 0.0,
PRIMARY KEY (entity_id_1, entity_id_2),
FOREIGN KEY (entity_id_1) REFERENCES entities(id),
FOREIGN KEY (entity_id_2) REFERENCES entities(id)
);
-- 2. Populate from existing fact_entities (facts with 2+ entities)
INSERT INTO entity_cooccurrence (entity_id_1, entity_id_2, cooccur_count)
SELECT
fe1.entity_id,
fe2.entity_id,
COUNT(DISTINCT fe1.fact_id)
FROM fact_entities fe1
JOIN fact_entities fe2 ON fe1.fact_id = fe2.fact_id
WHERE fe1.entity_id < fe2.entity_id -- Avoid duplicates
GROUP BY fe1.entity_id, fe2.entity_id;
-- 3. Compute weighted co-occurrence using IDF
UPDATE entity_cooccurrence SET cooccur_weight = (
SELECT
CAST(cooccur_count AS REAL) *
(SELECT idf_weight FROM entity_stats WHERE idf_weight = entity_id_1) *
(SELECT idf_weight FROM entity_stats WHERE entity_stats.id = entity_id_2)
WHERE entity_cooccurrence.entity_id_1 = entity_id_1
AND entity_cooccurrence.entity_id_2 = entity_id_2
);
-- 4. Create index for fast co-occurrence lookups
CREATE INDEX idx_cooccur_lookup ON entity_cooccurrence(entity_id_1, cooccur_weight DESC);Phase 3: CLI Command Updates
Extend the retain command with salience parameter:
# Before
openclaw memory retain "Sarah is leaving the company next month"
# After (with salience)
openclaw memory retain "Sarah is leaving the company next month" \
--type B \
--entity Sarah \
--entity project \
--salience 0.95Extend the recall command with salience filter and associative traversal:
# Before
openclaw memory recall "performance improvements"
# After (with enhanced options)
openclaw memory recall "performance improvements" \
--k 10 \
--min-salience 0.3 \
--associative-depth 2 \
--activation-decay 0.5Phase 4: Associative Traversal Algorithm
Implement depth-limited traversal with activation decay:
def associative_traverse(seed_entities: list[str], depth: int = 2, decay: float = 0.5) -> dict:
"""
Traverse entity co-occurrence graph with depth limiting and activation decay.
Returns:
dict: {entity_name: accumulated_activation_score}
"""
activation = {}
visited = set()
# Initialize seed entities with full activation
for entity_name in seed_entities:
activation[entity_name] = 1.0
visited.add(entity_name)
current_entities = seed_entities
current_activation = 1.0
for hop in range(depth):
next_entities = []
next_activation = current_activation * decay
for entity_name in current_entities:
# Query co-occurring entities with IDF weighting
cooccurring = query("""
SELECT e.name, c.cooccur_weight, es.idf_weight
FROM entity_cooccurrence c
JOIN entities e ON c.entity_id_2 = e.id
JOIN entity_stats es ON e.id = es.id
WHERE c.entity_id_1 = (
SELECT id FROM entities WHERE name = ?
)
AND e.name NOT IN ({}),
ORDER BY c.cooccur_weight * es.idf_weight DESC
LIMIT 10
""", entity_name)
for coentity_name, cooccur_weight, idf_weight in cooccurring:
if coentity_name not in visited:
contribution = next_activation * cooccur_weight * idf_weight
activation[coentity_name] = activation.get(coentity_name, 0) + contribution
next_entities.append(coentity_name)
visited.add(coentity_name)
current_entities = next_entities
current_activation = next_activation
return activationPhase 5: Access-Based Decay Implementation
Implement power-law decay on retrieval score:
def compute_retrieval_score(fact: dict, query_entities: list[str],
now: datetime = None) -> float:
"""
Compute composite retrieval score including salience and access-based decay.
Components:
- Base match score (lexical/semantic/associative)
- Salience weight (from retain call)
- Access decay (power-law, reset on retrieval)
"""
if now is None:
now = datetime.utcnow()
base_score = compute_base_match_score(fact, query_entities)
salience_score = fact.get('salience', 0.5)
# Access-based decay (power-law, halves every 7 days)
last_accessed = fact.get('last_accessed_at')
if last_accessed:
days_since_access = (now - last_accessed).days
access_decay = 0.5 ** (days_since_access / 7.0)
else:
access_decay = 0.25 # Never-accessed facts start quieter
# Boost for frequent access (logarithmic to prevent hub dominance)
access_count = fact.get('access_count', 0)
access_boost = 1.0 + (0.1 * math.log1p(access_count))
composite_score = (
base_score * 0.4 +
salience_score * 0.35 +
access_decay * access_boost * 0.25
)
return composite_score
def on_fact_retrieved(fact_id: int) -> None:
"""Update access tracking when a fact is retrieved."""
execute("""
UPDATE facts
SET last_accessed_at = ?,
access_count = access_count + 1
WHERE id = ?
""", (datetime.utcnow(), fact_id))Phase 6: Reflect Loop Integration
Update the reflect job to prioritize recently-accessed facts:
# In reflect job processor
def reflect_on_memories(agent_id: str, core_memory_max_tokens: int = 2048) -> None:
# Query recently-accessed facts weighted by salience
recent_facts = query("""
SELECT f.*,
COALESCE(f.salience, 0.5) *
(1.0 + 0.1 * LOG1P(COALESCE(f.access_count, 0))) as priority_score
FROM facts f
WHERE f.agent_id = ?
AND (
f.last_accessed_at > datetime('now', '-30 days')
OR f.salience > 0.8
)
ORDER BY priority_score DESC, f.last_accessed_at DESC
LIMIT 100
""", agent_id)
# Existing reflect logic operates on priority-filtered set
consolidated = consolidate_memories(recent_facts)
update_core_memory(consolidated, max_tokens=core_memory_max_tokens)π§ͺ Verification
Verification Test Suite
Execute the following commands to validate each enhancement:
Test 1: Schema Migration
$ sqlite3 ~/.openclaw/memory.db ".schema facts"
--- Expected output ---
CREATE TABLE facts (
...
salience REAL DEFAULT 0.5,
last_accessed_at DATETIME,
access_count INTEGER DEFAULT 0
);
$ sqlite3 ~/.openclaw/memory.db "SELECT COUNT(*) FROM entity_cooccurrence;"
--- Expected output ---
> 0 (before population) or > 100 (after population with populated index)Test 2: Salience-Aware Retain and Recall
# Retain with salience
$ openclaw memory retain "Sarah is leaving the company next month" \
--type B \
--entity Sarah \
--entity project \
--salience 0.95
--- Expected output ---
β Retained: B(s=0.95) @Sarah @project: Sarah is leaving...
# Verify in database
$ sqlite3 ~/.openclaw/memory.db \
"SELECT content, salience FROM facts WHERE content LIKE '%Sarah%';"
--- Expected output ---
Sarah is leaving the company next month|0.95Test 3: Access Tracking
# Query a fact (simulated)
$ openclaw memory recall "heartbeat configuration"
# Verify access tracking updated
$ sqlite3 ~/.openclaw/memory.db \
"SELECT content, last_accessed_at, access_count FROM facts ORDER BY access_count DESC LIMIT 3;"
--- Expected output ---
Updated heartbeat interval from 5m to 30m.|2025-01-15 10:30:00|5
Increased worker pool size to 4.|2025-01-15 09:15:00|3
Rate limiting middleware added.|2025-01-14 14:22:00|1Test 4: Associative Traversal Query
# Query with associative depth
$ openclaw memory recall "app performance" \
--associative-depth 2 \
--min-salience 0.3
--- Expected output ---
RETRIEVED (associative, depth=2):
Direct matches:
- W(s=0.2) @config: Updated heartbeat interval from 5m to 30m.
2-hop connections:
- B(s=0.95) @Sarah @project: Sarah is leaving... (via @database β @slow-endpoint)
- W(s=0.3) @api: Rate limiting middleware added. (via @slow-endpoint)
# Verify traversal path in debug mode
$ openclaw memory recall "app performance" --associative-depth 2 --debug
--- Expected output ---
Traversal: performance β {database, slow-endpoint, api}
β database β {Sarah, PostgreSQL, indexing}
β Final activation: {Sarah: 0.42, indexing: 0.31, ...}Test 5: Composite Scoring Validation
$ python3 -c "
from openclaw.memory.scoring import compute_retrieval_score
import datetime
test_fact = {
'content': 'Sarah is leaving next month',
'salience': 0.95,
'last_accessed_at': datetime.datetime.now() - datetime.timedelta(days=2),
'access_count': 5
}
score = compute_retrieval_score(test_fact, query_entities=['personnel'])
print(f'Composite score: {score:.3f}')
print(f' - Salience contribution: {0.95 * 0.35:.3f}')
print(f' - Access decay (2 days): {0.5 ** (2/7) * 1.15 * 0.25:.3f}')
"
--- Expected output ---
Composite score: 0.573
- Salience contribution: 0.333
- Access decay (2 days): 0.240Test 6: Reflect Job Prioritization
# Run reflect with debug output
$ openclaw memory reflect --agent-id test-agent --debug
--- Expected output ---
Processing 47 facts (filtered from 487 total by priority)
Top priority facts:
1. B(s=0.95) @Sarah @project: Sarah is leaving... (priority: 1.23)
2. B(s=0.9) @user @identity: User prefers morning standups... (priority: 1.19)
3. W(s=0.8) @Peter @deadline: Q1 deadline is March 15... (priority: 1.08)
Core memory updated: 1,847 tokens (was 2,103)β οΈ Common Pitfalls
Implementation Traps and Environment-Specific Considerations
Pitfall 1: Hub Node Dominance Without IDF Weighting
Symptom: Associative traversal returns nearly identical results regardless of queryβhigh-degree entities (Peter, config, system) dominate all paths.
Cause: Raw co-occurrence counts without inverse entity frequency weighting.
Fix: Ensure the entity_stats.idf_weight = 1 / log(entity_fact_count) formula is applied in all co-occurrence queries:
-- Wrong (hub dominance)
SELECT e.name FROM entities e
JOIN fact_entities fe ON e.id = fe.entity_id
WHERE fe.fact_id IN (
SELECT fact_id FROM fact_entities WHERE entity_id = ?
)
ORDER BY COUNT(*) DESC
-- Correct (IDF-weighted)
SELECT e.name FROM entities e
JOIN entity_stats es ON e.id = es.id
JOIN fact_entities fe ON e.id = fe.entity_id
WHERE fe.fact_id IN (
SELECT fact_id FROM fact_entities WHERE entity_id = ?
)
ORDER BY es.idf_weight * COUNT(*) DESCPitfall 2: Confusing Time-Based and Access-Based Decay
Symptom: Old but frequently-accessed facts receive low scores; fresh but never-accessed facts receive high scores.
Cause: Using last_accessed_at age alone instead of access-based decay with boost.
Rule: Access-based decay (reset on retrieval) outperforms time-based decay. A 3-month-old fact accessed weekly should outrank a 1-day-old fact never accessed:
# Wrong: Pure age decay
score = salience * (0.5 ** (age_in_days / 30))
# Correct: Access-based decay with boost
access_decay = 0.5 ** (days_since_last_access / 7) # Halves every 7 days
access_boost = 1.0 + (0.1 * log1p(access_count)) # Logarithmic, prevents hub dominance
score = salience * access_decay * access_boostPitfall 3: Associative Depth Too Deep
Symptom: Retrieval latency exceeds 500ms; output contains seemingly random facts.
Cause: Depth > 3 without activation cutoff floods the traversal.
Fix: Implement both depth limit AND minimum activation threshold:
MAX_DEPTH = 3
MIN_ACTIVATION = 0.05
INITIAL_ACTIVATION = 1.0
DECAY_PER_HOP = 0.5
# Traversal stops when:
# - Depth limit reached, OR
# - No entities exceed MIN_ACTIVATION thresholdPitfall 4: Salience Estimation Failure at Retain Time
Symptom: All facts receive similar salience scores (0.4-0.6); differentiation is lost.
Cause: LLM estimation is too conservative; defaults to middle values.
Fix: Implement prompt-based salience estimation with explicit anchors:
SYSTEM_PROMPT = """
Estimate salience (0.0-1.0) for this memory:
- 0.9-1.0: Identity-defining, relationship-changing, career-affecting
- 0.7-0.9: Important project decisions, team changes, deadlines
- 0.4-0.7: Routine work, configurations, bug fixes
- 0.1-0.4: Minor preferences, temp states, easily reconstructed
Memory: {fact_content}
Respond ONLY with a number between 0.0 and 1.0.
"""Always allow human override via --salience CLI flag or direct file editing.
Pitfall 5: Docker/Container Environment Permissions
Symptom: sqlite3: unable to open database file when running in Docker.
Cause: SQLite database mounted at volume with incorrect permissions or path.
Fix: Ensure volume mount preserves directory structure:
# Wrong
docker run -v /host/memory:/container/memory image
# Correct (bind mount the parent directory)
docker run -v /host/.openclaw:/root/.openclaw image
# Verify permissions
docker exec container ls -la /root/.openclaw/memory.db
# Should show: -rw-r--r-- 1 root root ...Pitfall 6: Raspberry Pi 5 Resource Constraints
Symptom: Associative traversal causes memory pressure on ARM device.
Cause: Python dictionaries for activation tracking + recursive queries exceed available RAM.
Fix: Limit traversal scope and use cursor-based iteration:
# Limit activation dict size
MAX_ACTIVATION_ENTITIES = 50
# Use generator for memory efficiency
def associative_traverse_stream(seed, depth, decay):
frontier = {seed: 1.0}
visited = {seed}
for _ in range(depth):
next_frontier = {}
for entity, activation in frontier.items():
if activation < MIN_ACTIVATION:
continue
for coentity in fetch_cooccurring(entity, limit=5):
if coentity not in visited:
next_frontier[coentity] = next_frontier.get(coentity, 0) + \
activation * decay
visited.add(coentity)
frontier = next_frontier
yield from frontier.items()π Related Errors
Contextually Connected Issues and Historical Reference
Related Design Documents
- Workspace Memory v2 Research Doc β The baseline architecture this guide extends. Key sections: "Entity-Aware Retrieval," "Incremental Indexing," "Reflect Loop"
- Hindsight Γ Letta Integration β Typed facts with confidence-bearing opinions provide the substrate for salience weighting
- CLS-M Prototype Analysis β Empirical validation (132 nodes, 802 edges, F1=44%) demonstrating precision challenges with naive spreading activation
Common Error Codes in Memory Systems
| Error Code | Description | Related To |
|---|---|---|
E2BIG | Context assembled exceeds token budget; reflect job cannot compress | Salience weighting, access decay |
ENOENTITY | Entity lookup returns empty but semantic search finds results | Entity extraction gap, FTS fallback |
EDUPFACTS | Near-duplicate facts accumulated without consolidation | Reflect loop limitations |
EHUBNODES | Retrieval dominated by high-frequency entities (Peter, system, config) | IDF weighting absence |
ECOLDSTART | New deployment has insufficient fact density for associative traversal | Entity co-occurrence density threshold |
EDECAYTOOFAST | Time-based decay erases useful old memories prematurely | Access-based vs. time-based decay |
Historical Context from CLS-M
The CLS-M prototype identified failure modes that informed these recommendations:
- F1=44% on 45-query benchmark β Precision (35%) was the bottleneck, not recall (65%)
- Hub noise kill:
heartbeatnode with 57 edges absorbed 15% of total activation on every query - Delegation failure: Sub-agent memory extraction failed consistently; the experiencing agent must own retention
- Spread too thin: Activation across 800+ edges diluted signal below useful thresholds
These findings validate the incremental approach: start with FTS5, add embeddings, then entity co-occurrence only after sufficient index density is reached.
OpenClaw Version Compatibility
| Version | Required Features | Migration Path |
|---|---|---|
| v0.11.x | Basic fact storage, FTS5 | Apply Phase 1-2 migrations |
| v0.12.0 | Entity extraction, salience fields | Apply Phase 1-6 incrementally |
| v0.13.0 (planned) | Associative traversal, access tracking | Full implementation |