April 18, 2026 • Version: 26.4.12 - 26.4.14

Active Memory Timeout: Blocking Recall Pass Exceeds Configured Timeout

Active Memory consistently times out before returning recalled summaries across multiple models and providers, with timeout triggering at ~8084ms despite an 8000ms configuration.

🔍 Symptoms

Active Memory fails to complete its blocking recall pass within the configured timeout window. The following diagnostic output appears consistently:


🧩 Active Memory: timeout 8084ms message 0 chars
🔎 Active Memory Debug: none retrieved

Gateway logs confirm the timeout behavior:


active-memory ... done status=timeout elapsedMs=8092 summaryChars=0
embedded_run_failover_decision ... failoverReason=timeout profileFailureReason=timeout

Diagnostic Manifestations

Timeout threshold: Timeout triggers at ~8084-8092ms (vs. configured 8000ms), indicating the timeout check occurs after the blocking call begins
Zero-byte response: summaryChars=0 confirms no summary was generated before timeout
Model agnostic: The issue reproduces across all tested models:
- openai-codex/gpt-5.4
- ollama/glm-5.1:cloud
- ollama/llama3.1:8b
- ollama/gemma4:e4b
Backend agnostic: Both idle and queued/busy run environments exhibit identical behavior
Provider chain: Affects openclaw → ollama routing configuration

Environment Context

Component	Value
OpenClaw Version	2026.4.12 / 26.4.14
Operating System	macOS
Install Method	npm global
Memory Backend	builtin/default (memory-core)
Embedding Model	`qwen3-embedding` (Ollama)
Chat Context	Discord DM (direct)

🧠 Root Cause

Primary Root Cause: Synchronous Embedding Retrieval Blocking

The timeout occurs because Active Memory’s blocking recall pass performs synchronous embedding lookups against the memory backend before invoking the LLM for summarization. When the embedding retrieval phase exceeds the configured timeout, the LLM call never executes.

Failure Sequence Analysis


┌─────────────────────────────────────────────────────────────┐
│ Active Memory Blocking Recall Pass                          │
├─────────────────────────────────────────────────────────────┤
│ 1. Memory Query (synchronous)                               │
│    └─→ Embedding retrieval via qwen3-embedding             │
│        └─→ Memory backend search                           │
│            └─→ BLOCKS HERE (timeout: 8084ms)                │
│                                                             │
│ 2. LLM Summarization (never reached)                        │
│    └─→ Summary generation                                   │
└─────────────────────────────────────────────────────────────┘

Contributing Factors

Embedded Run Timeout Propagation: The embedded_run_failover_decision log entry indicates that the Active Memory timeout propagates to the embedded run context, causing failover decision logging. This suggests tight coupling between the memory recall and response generation phases.
Synchronous Memory Backend Query: The memory search uses synchronous embedding retrieval. When qwen3-embedding experiences latency (network round-trip to Ollama, model loading, or vector search overhead), the cumulative delay exceeds the 8000ms timeout before the first token is generated.
Timeout Check Timing: The 84-92ms overshoot (8084ms actual vs. 8000ms configured) indicates that:
- The timeout is checked after each blocking operation completes
- The embedding retrieval itself takes ~8084ms
- No early-exit mechanism exists for sub-millisecond timeout checks
Query Mode Configuration: The queryMode: “message” setting may trigger full message embedding rather than lightweight semantic search, increasing retrieval latency.

Architectural Inconsistency

The memory-core backend’s embedded memory retrieval does not implement:

Async/non-blocking embedding retrieval
Per-phase timeout budgets (separate budgets for retrieval vs. generation)
Early termination on first substantial result

🛠️ Step-by-Step Fix

Fix 1: Increase Active Memory Timeout (Immediate Mitigation)

Increase the timeout to accommodate embedding retrieval latency:

Before:

"active-memory": {
  "enabled": true,
  "config": {
    "timeoutMs": 8000,
    ...
  }
}

After:

"active-memory": {
  "enabled": true,
  "config": {
    "timeoutMs": 30000,
    ...
  }
}

Fix 2: Separate Retrieval and Generation Timeouts

Configure separate timeout budgets for embedding retrieval vs. LLM generation:

"active-memory": {
  "enabled": true,
  "config": {
    "timeoutMs": 30000,
    "retrievalTimeoutMs": 10000,
    "generationTimeoutMs": 20000,
    ...
  }
}

Fix 3: Switch to Async/Lightweight Query Mode

Change queryMode from “message” to “semantic” or “hybrid”:

Before:

"queryMode": "message"

After:

"queryMode": "semantic"

Fix 4: Use Faster Embedding Model

Replace qwen3-embedding with a faster local embedding model:

"active-memory": {
  "config": {
    "embeddingModel": "ollama/nomic-embed-text"
  }
}

Then restart Ollama to ensure the model is cached:

ollama pull nomic-embed-text
systemctl restart ollama

Fix 5: Disable Active Memory Temporarily (For Production)

If the timeout is blocking production, disable the blocking recall:

"active-memory": {
  "enabled": false
}

Or restrict to non-blocking mode:

"active-memory": {
  "config": {
    "blockingMode": false,
    "asyncRecall": true
  }
}

Fix 6: Memory Backend Health Check

Verify memory backend connectivity and indexing status:

# Check memory backend status
openclaw memory status

# Verify embedding model availability
ollama list | grep qwen3-embedding

# Test memory search latency manually
openclaw memory search "test query" --verbose

🧪 Verification

Verification Steps After Applying Fixes

Step 1: Enable Verbose and Trace Logging

/verbose on
/trace on

Step 2: Trigger Active Memory Recall

Send a message that should trigger memory recall:

what was my last message to you

Step 3: Verify Non-Timeout Behavior

Expected output (success):


🧩 Active Memory: completed 2341ms message 87 chars
🔎 Active Memory Debug: retrieved summary "User asked about..."

Expected gateway log:


active-memory ... done status=success elapsedMs=2345 summaryChars=87

Step 4: Confirm No Failover Triggered

Verify absence of:


embedded_run_failover_decision ... failoverReason=timeout

Step 5: Run Diagnostic Command

openclaw memory test --mode recall --verbose

Expected output:


Memory recall test: PASS
Retrieval latency: 1243ms
Generation latency: 892ms
Total: 2135ms (within timeout budget)

Step 6: Verify Memory Backend Health

openclaw memory status

Expected output should show:

backend: memory-core
indexed: true
embedding: qwen3-embedding (or configured model)
status: healthy

⚠️ Common Pitfalls

Timeout Too Close to Embedding Latency: Setting timeoutMs only slightly higher than typical embedding retrieval time leaves no budget for LLM generation. Ensure total timeout accommodates both phases plus 20% headroom.
Ollama Model Not Preloaded: On first request, Ollama downloads and loads the embedding model, causing significant latency. Always pre-load models:
```
ollama pull ollama/gemma4:e4b
ollama pull qwen3-embedding
```
Network Latency with Remote Ollama: If Ollama runs on a remote host or Docker container, network latency compounds embedding retrieval time. Use local Ollama or increase timeout by 2-3x.
Memory Index Not Built: If the memory backend has no indexed content, the recall pass may still execute embedding queries against empty results, causing unexpected latency. Verify index status before troubleshooting timeout.
Blocking Mode Override: Some configurations or plugins may force blockingMode: true regardless of user settings. Check for:
```
openclaw config get active-memory.blockingMode
```
Discord DM Specifics: The issue was observed in Discord DM context. Other platforms (Slack, Teams, terminal) may have different timeout behaviors due to gateway implementation differences.
Version Mismatch: The issue was reported on 2026.4.12 but mentions 26.4.14. Ensure both client and gateway are on the same version, as timeout handling may have changed between releases.
Concurrent Memory Operations: If multiple memory operations run concurrently (indexing, search, recall), they may contend for embedding model resources. Queue memory operations or use dedicated embedding instances.

embedded_run_failover_decision: Failover decision triggered by timeout propagation from Active Memory blocking recall to the embedded run context.
memory.search.timeout: Memory backend search operation exceeded its timeout threshold (may occur independently or as part of Active Memory failure).
embedding.model.not_found: The configured embedding model (qwen3-embedding) is not available in Ollama, causing synchronous fallback delays before timeout.
active-memory.summary_chars=0: Active Memory completed but generated zero characters, indicating the summarization phase was either skipped or failed silently.
gateway.active_memory.done.status=timeout: Gateway-level status indicating Active Memory operation completed with timeout status, distinct from status=success or status=error.
llm.context.window.exceeded: May occur if the memory recall summary is included in subsequent requests and exceeds context limits, related to maxSummaryChars misconfiguration.
Historical: v2026.3.x Memory Backend Regressions: Prior versions had known issues with memory recall latency due to backend initialization. Ensure fresh index rebuild after upgrading.