Active Memory Timeout: Blocking Recall Pass Exceeds Configured Timeout
Active Memory consistently times out before returning recalled summaries across multiple models and providers, with timeout triggering at ~8084ms despite an 8000ms configuration.
๐ Symptoms
Active Memory fails to complete its blocking recall pass within the configured timeout window. The following diagnostic output appears consistently:
๐งฉ Active Memory: timeout 8084ms message 0 chars
๐ Active Memory Debug: none retrieved
Gateway logs confirm the timeout behavior:
active-memory ... done status=timeout elapsedMs=8092 summaryChars=0
embedded_run_failover_decision ... failoverReason=timeout profileFailureReason=timeout
Diagnostic Manifestations
- Timeout threshold: Timeout triggers at ~8084-8092ms (vs. configured 8000ms), indicating the timeout check occurs after the blocking call begins
- Zero-byte response:
summaryChars=0confirms no summary was generated before timeout - Model agnostic: The issue reproduces across all tested models:
openai-codex/gpt-5.4ollama/glm-5.1:cloudollama/llama3.1:8bollama/gemma4:e4b
- Backend agnostic: Both idle and queued/busy run environments exhibit identical behavior
- Provider chain: Affects
openclaw โ ollamarouting configuration
Environment Context
| Component | Value |
|---|---|
| OpenClaw Version | 2026.4.12 / 26.4.14 |
| Operating System | macOS |
| Install Method | npm global |
| Memory Backend | builtin/default (memory-core) |
| Embedding Model | qwen3-embedding (Ollama) |
| Chat Context | Discord DM (direct) |
๐ง Root Cause
Primary Root Cause: Synchronous Embedding Retrieval Blocking
The timeout occurs because Active Memory’s blocking recall pass performs synchronous embedding lookups against the memory backend before invoking the LLM for summarization. When the embedding retrieval phase exceeds the configured timeout, the LLM call never executes.
Failure Sequence Analysis
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Active Memory Blocking Recall Pass โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 1. Memory Query (synchronous) โ
โ โโโ Embedding retrieval via qwen3-embedding โ
โ โโโ Memory backend search โ
โ โโโ BLOCKS HERE (timeout: 8084ms) โ
โ โ
โ 2. LLM Summarization (never reached) โ
โ โโโ Summary generation โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Contributing Factors
Embedded Run Timeout Propagation: The
embedded_run_failover_decisionlog entry indicates that the Active Memory timeout propagates to the embedded run context, causing failover decision logging. This suggests tight coupling between the memory recall and response generation phases.Synchronous Memory Backend Query: The memory search uses synchronous embedding retrieval. When
qwen3-embeddingexperiences latency (network round-trip to Ollama, model loading, or vector search overhead), the cumulative delay exceeds the 8000ms timeout before the first token is generated.Timeout Check Timing: The 84-92ms overshoot (
8084ms actual vs. 8000ms configured) indicates that:- The timeout is checked after each blocking operation completes
- The embedding retrieval itself takes ~8084ms
- No early-exit mechanism exists for sub-millisecond timeout checks
Query Mode Configuration: The
queryMode: “message”setting may trigger full message embedding rather than lightweight semantic search, increasing retrieval latency.
Architectural Inconsistency
The memory-core backend’s embedded memory retrieval does not implement:
- Async/non-blocking embedding retrieval
- Per-phase timeout budgets (separate budgets for retrieval vs. generation)
- Early termination on first substantial result
๐ ๏ธ Step-by-Step Fix
Fix 1: Increase Active Memory Timeout (Immediate Mitigation)
Increase the timeout to accommodate embedding retrieval latency:
Before:
"active-memory": {
"enabled": true,
"config": {
"timeoutMs": 8000,
...
}
}After:
"active-memory": {
"enabled": true,
"config": {
"timeoutMs": 30000,
...
}
}Fix 2: Separate Retrieval and Generation Timeouts
Configure separate timeout budgets for embedding retrieval vs. LLM generation:
"active-memory": {
"enabled": true,
"config": {
"timeoutMs": 30000,
"retrievalTimeoutMs": 10000,
"generationTimeoutMs": 20000,
...
}
}Fix 3: Switch to Async/Lightweight Query Mode
Change queryMode from “message” to “semantic” or “hybrid”:
Before:
"queryMode": "message"After:
"queryMode": "semantic"Fix 4: Use Faster Embedding Model
Replace qwen3-embedding with a faster local embedding model:
"active-memory": {
"config": {
"embeddingModel": "ollama/nomic-embed-text"
}
}Then restart Ollama to ensure the model is cached:
ollama pull nomic-embed-text
systemctl restart ollamaFix 5: Disable Active Memory Temporarily (For Production)
If the timeout is blocking production, disable the blocking recall:
"active-memory": {
"enabled": false
}Or restrict to non-blocking mode:
"active-memory": {
"config": {
"blockingMode": false,
"asyncRecall": true
}
}Fix 6: Memory Backend Health Check
Verify memory backend connectivity and indexing status:
# Check memory backend status
openclaw memory status
# Verify embedding model availability
ollama list | grep qwen3-embedding
# Test memory search latency manually
openclaw memory search "test query" --verbose๐งช Verification
Verification Steps After Applying Fixes
Step 1: Enable Verbose and Trace Logging
/verbose on
/trace onStep 2: Trigger Active Memory Recall
Send a message that should trigger memory recall:
what was my last message to youStep 3: Verify Non-Timeout Behavior
Expected output (success):
๐งฉ Active Memory: completed 2341ms message 87 chars
๐ Active Memory Debug: retrieved summary "User asked about..."
Expected gateway log:
active-memory ... done status=success elapsedMs=2345 summaryChars=87
Step 4: Confirm No Failover Triggered
Verify absence of:
embedded_run_failover_decision ... failoverReason=timeout
Step 5: Run Diagnostic Command
openclaw memory test --mode recall --verboseExpected output:
Memory recall test: PASS
Retrieval latency: 1243ms
Generation latency: 892ms
Total: 2135ms (within timeout budget)
Step 6: Verify Memory Backend Health
openclaw memory statusExpected output should show:
backend: memory-coreindexed: trueembedding: qwen3-embedding (or configured model)status: healthy
โ ๏ธ Common Pitfalls
Timeout Too Close to Embedding Latency: Setting
timeoutMsonly slightly higher than typical embedding retrieval time leaves no budget for LLM generation. Ensure total timeout accommodates both phases plus 20% headroom.Ollama Model Not Preloaded: On first request, Ollama downloads and loads the embedding model, causing significant latency. Always pre-load models:
ollama pull ollama/gemma4:e4b ollama pull qwen3-embeddingNetwork Latency with Remote Ollama: If Ollama runs on a remote host or Docker container, network latency compounds embedding retrieval time. Use local Ollama or increase timeout by 2-3x.
Memory Index Not Built: If the memory backend has no indexed content, the recall pass may still execute embedding queries against empty results, causing unexpected latency. Verify index status before troubleshooting timeout.
Blocking Mode Override: Some configurations or plugins may force
blockingMode: trueregardless of user settings. Check for:openclaw config get active-memory.blockingModeDiscord DM Specifics: The issue was observed in Discord DM context. Other platforms (Slack, Teams, terminal) may have different timeout behaviors due to gateway implementation differences.
Version Mismatch: The issue was reported on
2026.4.12but mentions26.4.14. Ensure both client and gateway are on the same version, as timeout handling may have changed between releases.Concurrent Memory Operations: If multiple memory operations run concurrently (indexing, search, recall), they may contend for embedding model resources. Queue memory operations or use dedicated embedding instances.
๐ Related Errors
embedded_run_failover_decision: Failover decision triggered by timeout propagation from Active Memory blocking recall to the embedded run context.memory.search.timeout: Memory backend search operation exceeded its timeout threshold (may occur independently or as part of Active Memory failure).embedding.model.not_found: The configured embedding model (qwen3-embedding) is not available in Ollama, causing synchronous fallback delays before timeout.active-memory.summary_chars=0: Active Memory completed but generated zero characters, indicating the summarization phase was either skipped or failed silently.gateway.active_memory.done.status=timeout: Gateway-level status indicating Active Memory operation completed with timeout status, distinct fromstatus=successorstatus=error.llm.context.window.exceeded: May occur if the memory recall summary is included in subsequent requests and exceeds context limits, related tomaxSummaryCharsmisconfiguration.Historical: v2026.3.x Memory Backend Regressions: Prior versions had known issues with memory recall latency due to backend initialization. Ensure fresh index rebuild after upgrading.