April 18, 2026 โ€ข Version: 26.4.12 - 26.4.14

Active Memory Timeout: Blocking Recall Pass Exceeds Configured Timeout

Active Memory consistently times out before returning recalled summaries across multiple models and providers, with timeout triggering at ~8084ms despite an 8000ms configuration.

๐Ÿ” Symptoms

Active Memory fails to complete its blocking recall pass within the configured timeout window. The following diagnostic output appears consistently:


๐Ÿงฉ Active Memory: timeout 8084ms message 0 chars
๐Ÿ”Ž Active Memory Debug: none retrieved

Gateway logs confirm the timeout behavior:


active-memory ... done status=timeout elapsedMs=8092 summaryChars=0
embedded_run_failover_decision ... failoverReason=timeout profileFailureReason=timeout

Diagnostic Manifestations

  • Timeout threshold: Timeout triggers at ~8084-8092ms (vs. configured 8000ms), indicating the timeout check occurs after the blocking call begins
  • Zero-byte response: summaryChars=0 confirms no summary was generated before timeout
  • Model agnostic: The issue reproduces across all tested models:
    • openai-codex/gpt-5.4
    • ollama/glm-5.1:cloud
    • ollama/llama3.1:8b
    • ollama/gemma4:e4b
  • Backend agnostic: Both idle and queued/busy run environments exhibit identical behavior
  • Provider chain: Affects openclaw โ†’ ollama routing configuration

Environment Context

ComponentValue
OpenClaw Version2026.4.12 / 26.4.14
Operating SystemmacOS
Install Methodnpm global
Memory Backendbuiltin/default (memory-core)
Embedding Modelqwen3-embedding (Ollama)
Chat ContextDiscord DM (direct)

๐Ÿง  Root Cause

Primary Root Cause: Synchronous Embedding Retrieval Blocking

The timeout occurs because Active Memory’s blocking recall pass performs synchronous embedding lookups against the memory backend before invoking the LLM for summarization. When the embedding retrieval phase exceeds the configured timeout, the LLM call never executes.

Failure Sequence Analysis


โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Active Memory Blocking Recall Pass                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1. Memory Query (synchronous)                               โ”‚
โ”‚    โ””โ”€โ†’ Embedding retrieval via qwen3-embedding             โ”‚
โ”‚        โ””โ”€โ†’ Memory backend search                           โ”‚
โ”‚            โ””โ”€โ†’ BLOCKS HERE (timeout: 8084ms)                โ”‚
โ”‚                                                             โ”‚
โ”‚ 2. LLM Summarization (never reached)                        โ”‚
โ”‚    โ””โ”€โ†’ Summary generation                                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Contributing Factors

  1. Embedded Run Timeout Propagation: The embedded_run_failover_decision log entry indicates that the Active Memory timeout propagates to the embedded run context, causing failover decision logging. This suggests tight coupling between the memory recall and response generation phases.

  2. Synchronous Memory Backend Query: The memory search uses synchronous embedding retrieval. When qwen3-embedding experiences latency (network round-trip to Ollama, model loading, or vector search overhead), the cumulative delay exceeds the 8000ms timeout before the first token is generated.

  3. Timeout Check Timing: The 84-92ms overshoot (8084ms actual vs. 8000ms configured) indicates that:

    • The timeout is checked after each blocking operation completes
    • The embedding retrieval itself takes ~8084ms
    • No early-exit mechanism exists for sub-millisecond timeout checks
  4. Query Mode Configuration: The queryMode: “message” setting may trigger full message embedding rather than lightweight semantic search, increasing retrieval latency.

Architectural Inconsistency

The memory-core backend’s embedded memory retrieval does not implement:

  • Async/non-blocking embedding retrieval
  • Per-phase timeout budgets (separate budgets for retrieval vs. generation)
  • Early termination on first substantial result

๐Ÿ› ๏ธ Step-by-Step Fix

Fix 1: Increase Active Memory Timeout (Immediate Mitigation)

Increase the timeout to accommodate embedding retrieval latency:

Before:

"active-memory": {
  "enabled": true,
  "config": {
    "timeoutMs": 8000,
    ...
  }
}

After:

"active-memory": {
  "enabled": true,
  "config": {
    "timeoutMs": 30000,
    ...
  }
}

Fix 2: Separate Retrieval and Generation Timeouts

Configure separate timeout budgets for embedding retrieval vs. LLM generation:

"active-memory": {
  "enabled": true,
  "config": {
    "timeoutMs": 30000,
    "retrievalTimeoutMs": 10000,
    "generationTimeoutMs": 20000,
    ...
  }
}

Fix 3: Switch to Async/Lightweight Query Mode

Change queryMode from “message” to “semantic” or “hybrid”:

Before:

"queryMode": "message"

After:

"queryMode": "semantic"

Fix 4: Use Faster Embedding Model

Replace qwen3-embedding with a faster local embedding model:

"active-memory": {
  "config": {
    "embeddingModel": "ollama/nomic-embed-text"
  }
}

Then restart Ollama to ensure the model is cached:

ollama pull nomic-embed-text
systemctl restart ollama

Fix 5: Disable Active Memory Temporarily (For Production)

If the timeout is blocking production, disable the blocking recall:

"active-memory": {
  "enabled": false
}

Or restrict to non-blocking mode:

"active-memory": {
  "config": {
    "blockingMode": false,
    "asyncRecall": true
  }
}

Fix 6: Memory Backend Health Check

Verify memory backend connectivity and indexing status:

# Check memory backend status
openclaw memory status

# Verify embedding model availability
ollama list | grep qwen3-embedding

# Test memory search latency manually
openclaw memory search "test query" --verbose

๐Ÿงช Verification

Verification Steps After Applying Fixes

Step 1: Enable Verbose and Trace Logging

/verbose on
/trace on

Step 2: Trigger Active Memory Recall

Send a message that should trigger memory recall:

what was my last message to you

Step 3: Verify Non-Timeout Behavior

Expected output (success):


๐Ÿงฉ Active Memory: completed 2341ms message 87 chars
๐Ÿ”Ž Active Memory Debug: retrieved summary "User asked about..."

Expected gateway log:


active-memory ... done status=success elapsedMs=2345 summaryChars=87

Step 4: Confirm No Failover Triggered

Verify absence of:


embedded_run_failover_decision ... failoverReason=timeout

Step 5: Run Diagnostic Command

openclaw memory test --mode recall --verbose

Expected output:


Memory recall test: PASS
Retrieval latency: 1243ms
Generation latency: 892ms
Total: 2135ms (within timeout budget)

Step 6: Verify Memory Backend Health

openclaw memory status

Expected output should show:

  • backend: memory-core
  • indexed: true
  • embedding: qwen3-embedding (or configured model)
  • status: healthy

โš ๏ธ Common Pitfalls

  • Timeout Too Close to Embedding Latency: Setting timeoutMs only slightly higher than typical embedding retrieval time leaves no budget for LLM generation. Ensure total timeout accommodates both phases plus 20% headroom.

  • Ollama Model Not Preloaded: On first request, Ollama downloads and loads the embedding model, causing significant latency. Always pre-load models:

    ollama pull ollama/gemma4:e4b
    ollama pull qwen3-embedding
  • Network Latency with Remote Ollama: If Ollama runs on a remote host or Docker container, network latency compounds embedding retrieval time. Use local Ollama or increase timeout by 2-3x.

  • Memory Index Not Built: If the memory backend has no indexed content, the recall pass may still execute embedding queries against empty results, causing unexpected latency. Verify index status before troubleshooting timeout.

  • Blocking Mode Override: Some configurations or plugins may force blockingMode: true regardless of user settings. Check for:

    openclaw config get active-memory.blockingMode
  • Discord DM Specifics: The issue was observed in Discord DM context. Other platforms (Slack, Teams, terminal) may have different timeout behaviors due to gateway implementation differences.

  • Version Mismatch: The issue was reported on 2026.4.12 but mentions 26.4.14. Ensure both client and gateway are on the same version, as timeout handling may have changed between releases.

  • Concurrent Memory Operations: If multiple memory operations run concurrently (indexing, search, recall), they may contend for embedding model resources. Queue memory operations or use dedicated embedding instances.

  • embedded_run_failover_decision: Failover decision triggered by timeout propagation from Active Memory blocking recall to the embedded run context.

  • memory.search.timeout: Memory backend search operation exceeded its timeout threshold (may occur independently or as part of Active Memory failure).

  • embedding.model.not_found: The configured embedding model (qwen3-embedding) is not available in Ollama, causing synchronous fallback delays before timeout.

  • active-memory.summary_chars=0: Active Memory completed but generated zero characters, indicating the summarization phase was either skipped or failed silently.

  • gateway.active_memory.done.status=timeout: Gateway-level status indicating Active Memory operation completed with timeout status, distinct from status=success or status=error.

  • llm.context.window.exceeded: May occur if the memory recall summary is included in subsequent requests and exceeds context limits, related to maxSummaryChars misconfiguration.

  • Historical: v2026.3.x Memory Backend Regressions: Prior versions had known issues with memory recall latency due to backend initialization. Ensure fresh index rebuild after upgrading.

Evidence & Sources

This troubleshooting guide was automatically synthesized by the FixClaw Intelligence Pipeline from community discussions.