Heartbeat Death Loop β pendingFinalDelivery Stuck on Agent Main Session
When a heartbeat run returns any non-empty text to a session with origin.to set to the pseudo-target 'heartbeat', the main session enters a permanent pendingFinalDelivery:true state that blocks all future heartbeat executions indefinitely.
π Symptoms
Primary Manifestation
The OpenClaw gateway exhibits complete heartbeat silence despite clean startup logs. The scheduler fires on schedule, but no actual heartbeat runs execute. This can persist for days.
Diagnostic Command Output
Inspect the main session state directly from the sessions registry:
bash
python3 -c "
import json, time
sessions_path = ‘/home/openclaw/.openclaw/agents/
Expected stuck-state output: json pendingFinalDelivery: true pendingFinalDeliveryAttemptCount: 64 pendingFinalDeliveryLastError: null updatedAt: 1746702780000 origin: {“label”:“heartbeat”,“from”:“heartbeat”,“to”:“heartbeat”}
Note: pendingFinalDeliveryAttemptCount increments each heartbeat interval. A count > 1 with pendingFinalDeliveryLastError: null is the definitive signature.
Gateway Log Behavior
Boot-time log β appears clean
[gateway] heartbeat: started intervalMs: 3600000
Subsequent ticks β no run executes, no errors logged
[gateway] <silence for 64+ hours>
CLI Diagnostic Commands Return No Alerts
bash
These commands surface nothing about the stuck state
openclaw cron list openclaw doctor openclaw system heartbeat last
All return normal/empty output despite heartbeat deadlock
Triggering Condition
The bug activates when:
- An agent’s heartbeat fires with no configured delivery
target(defaults to"none") - The heartbeat run produces output β even the bare
HEARTBEAT_OKtoken qualifies - The session origin is set to
origin.to = "heartbeat"(auto-reply pseudo-target)
After the first heartbeat run with output, the session accumulates the heartbeat origin, and subsequent heartbeats enter the retry loop.
π§ Root Cause
Architectural Overview
The heartbeat system involves three distinct layers:
- Heartbeat Runner (
heartbeat-runner-DpQCcYf2.js) β Schedules and executes heartbeat ticks - Agent Runner Runtime (
agent-runner.runtime-DQsCsHUA.js) β Produces heartbeat output and writes session state - Dispatch System (
dispatch-8E8vi2HV.js) β Routes output to delivery channels
Bug A β Pending-Delivery Flag Set on Effective-No-Output
In agent-runner.runtime-DQsCsHUA.js (lines 4093-4095):
javascript // Current implementation β non-empty pendingText always triggers if (pendingText) { session.pendingFinalDelivery = true; session.pendingFinalDeliveryText = pendingText; session.pendingFinalDeliveryCreatedAt = now; }
For heartbeat sessions, when the agent returns HEARTBEAT_OK, the pendingText is populated with this token. The runner has no special handling for heartbeat content β it treats HEARTBEAT_OK as a legitimate output requiring delivery confirmation.
The stripHeartbeatToken function in heartbeat-Dynyl6hI.js (lines 52-87) runs after the pending-delivery state is written to the session, not before. Therefore, even stripped-to-empty output still triggers the pending queue.
Bug B β Silent Retry Against Pseudo-Target
In dispatch-8E8vi2HV.js (lines 227-246), the success handler clearPendingFinalDeliveryAfterSuccess only clears the flag on success. There is no corresponding failure handler that captures the error into pendingFinalDeliveryLastError.
When delivery.to === "heartbeat":
javascript // dispatch attempts delivery to “heartbeat” pseudo-target // No channel adapter resolves this target // dispatch returns silently without error capture // pendingFinalDelivery stays true, updatedAt gets bumped to now
The retry path:
- Heartbeat fires β pendingFinalDelivery is true
- Dispatch attempts delivery β silent failure
updatedAt = nowon every attempt- 30-second skip window check passes (updatedAt is now, not old)
- Next heartbeat interval fires β same sequence repeats
The Compounding Effect
The skip window in runHeartbeatOnce (lines 866-870):
javascript if (recentSessionEntry?.pendingFinalDelivery === true && recentSessionEntry?.updatedAt && startedAt - recentSessionEntry.updatedAt < 3e4) { return SKIP_REQUESTS_IN_FLIGHT; }
This logic is correct in isolation β it prevents overlapping heartbeat runs. However, combined with dispatch failures that bump updatedAt = now on each silent failure, the condition evaluates as now - now < 30000 (always true), causing perpetual skips.
Why Fresh Sessions Don’t Deadlock
When a new session is created (no persisted state), origin is null and lastTo is null. The dispatch path has no delivery.to to route against, so it clears pending as “nothing to deliver.” The cosmetic pendingFinalDelivery: true remains but updatedAt is not bumped, breaking the retry loop.
π οΈ Step-by-Step Fix
Option 1: Workaround (Immediate β No Code Change)
Use when: You cannot restart the gateway or apply the code fix immediately.
Steps:
- Stop the gateway gracefully:
bash sudo systemctl stop openclaw-gateway
or
openclaw gateway stop
- Locate the agent's main session entry and associated files:
bash
AGENT_ID="
List related files
ls -la “${SESSION_DIR}/” | grep “${AGENT_ID}:main”
Expected: sessions.json, .jsonl, .trajectory.jsonl
- Remove the main session entry and files:
bash
Backup before modification
cp “${SESSION_DIR}/sessions.json” “${SESSION_DIR}/sessions.json.bak.$(date +%s)”
Use python to remove the main session entry
python3 -c " import json
agent_id = ‘
with open(session_path, ‘r’) as f: sessions = json.load(f)
main_key = f’agent:{agent_id}:main’ if main_key in sessions: print(f’Removing session: {main_key}’) del sessions[main_key] with open(session_path, ‘w’) as f: json.dump(sessions, f, indent=2) print(‘Session removed successfully’) else: print(f’Session {main_key} not found’) "
- Remove associated session files:
bash
Identify and remove .jsonl and .trajectory.jsonl files for the main session
cd “/home/openclaw/.openclaw/agents/
- Restart the gateway:
bash sudo systemctl start openclaw-gateway
Verify startup
sudo journalctl -u openclaw-gateway -f –lines=50
After workaround: The gateway creates a fresh main session on the next heartbeat tick. The new session has origin: null, breaking the dispatch retry loop.
Option 2: Permanent Fix (Code Changes)
Fix A β Gate Pending-Delivery Write on Effectively Empty Heartbeat Content
File: agent-runner.runtime-DQsCsHUA.js
Location: Around line 4093-4095
Before: javascript if (pendingText) { session.pendingFinalDelivery = true; session.pendingFinalDeliveryText = pendingText; session.pendingFinalDeliveryCreatedAt = now; }
After: javascript // For heartbeat sessions, check if stripped output is effectively empty const isHeartbeat = session?.origin?.to === ‘heartbeat’; const strippedContent = isHeartbeat ? stripHeartbeatToken(pendingText).text : pendingText; const isEffectivelyEmpty = !strippedContent || strippedContent.trim() === ‘’;
if (pendingText && !isEffectivelyEmpty) { session.pendingFinalDelivery = true; session.pendingFinalDeliveryText = pendingText; session.pendingFinalDeliveryCreatedAt = now; }
Fix B-1 β Capture Dispatch Failures into pendingFinalDeliveryLastError
File: dispatch-8E8vi2HV.js
Location: After line 246 (after clearPendingFinalDeliveryAfterSuccess)
Add new function: javascript function recordPendingFinalDeliveryFailure(session, errorMessage) { session.pendingFinalDeliveryLastError = errorMessage || ‘Unknown delivery failure’; session.pendingFinalDeliveryLastErrorAt = Date.now(); saveSession(session); }
Call in dispatch failure path:
javascript
// In the delivery failure handler
if (session.pendingFinalDelivery) {
recordPendingFinalDeliveryFailure(session,
Delivery failed: ${error?.message || 'No adapter resolved for target: ' + delivery?.to}
);
}
Fix B-2 β Treat Pseudo-Target Heartbeat as Immediate Success
File: dispatch-8E8vi2HV.js
Location: Before attempting delivery (around line 227)
Add check: javascript // If delivery target is the heartbeat pseudo-channel and no adapter resolves, // treat as immediate success β the heartbeat acknowledges by reaching the target if (delivery.to === ‘heartbeat’) { clearPendingFinalDeliveryAfterSuccess(session); log.debug(‘Heartbeat delivery acknowledged (pseudo-target)’); return { deliverySucceeded: true }; }
Fix C β Harden openclaw doctor
File: doctor.js or diagnostics module
Add check for:
javascript
// Warn when pendingFinalDelivery is stuck with no error captured
const ONE_HOUR_MS = 60 * 60 * 1000;
if (session.pendingFinalDelivery === true
&& session.pendingFinalDeliveryLastError === null
&& session.pendingFinalDeliveryCreatedAt
&& (Date.now() - session.pendingFinalDeliveryCreatedAt) > ONE_HOUR_MS) {
warnings.push({
severity: ‘HIGH’,
code: ‘STUCK_PENDING_DELIVERY’,
message: Session ${sessionKey} has pendingFinalDelivery stuck for >1h with no error captured,
sessionKey
});
}
π§ͺ Verification
Immediate Verification (After Workaround)
bash
1. Confirm gateway is running
openclaw gateway status
Expected: “Gateway running” or similar
2. Check for new main session creation
sleep 5
python3 -c "
import json, time
with open(’/home/openclaw/.openclaw/agents/
Expected: origin is null, pendingFinalDelivery is false or null
3. Verify heartbeat fires within 1 minute
openclaw system heartbeat last
Expected: Recent heartbeat with HEARTBEAT_OK, no pending flags
Post-Fix Verification (After Code Changes)
bash
1. Build and restart with fixes
npm run build sudo systemctl restart openclaw-gateway
2. Monitor for 2+ heartbeat intervals (test with short interval first)
Set heartbeat to 2 minutes for testing:
openclaw config set heartbeat.every “2m”
3. Check session state after multiple heartbeat cycles
python3 -c " import json, time
with open(’/home/openclaw/.openclaw/agents/
main = sessions.get(‘agent:
Expected after fix: pendingFinalDelivery is false OR
(if true) pendingFinalDeliveryLastError contains error string
Stress Test β Force Heartbeat Output
bash
Create a temporary agent with heartbeat that outputs text
Then trigger heartbeat manually
Option A: Via CLI
openclaw system event –mode now –text “force heartbeat”
–url ws://127.0.0.1:18789
–token $OPENCLAW_GATEWAY_TOKEN
Option B: Wait for scheduled heartbeat
Check state immediately after
sleep 2
python3 -c "
import json
with open(’/home/openclaw/.openclaw/agents/
Expected: If fix B-2 applied, pendingFinalDelivery should clear immediately
If only fix A applied, pendingFinalDelivery should not be set on heartbeat output
Doctor Command Verification
bash
After applying Fix C
openclaw doctor
Expected output should include warning if any session has:
pendingFinalDelivery: true AND now - pendingFinalDeliveryCreatedAt > 1h
AND pendingFinalDeliveryLastError === null
β οΈ Common Pitfalls
Pitfall 1 β Partial Session Cleanup
Problem: Removing only pendingFinalDelivery* fields without removing the session entry.
Why it fails: bash
This does NOT fix the issue
python3 -c "
import json
with open(‘sessions.json’) as f:
sessions = json.load(f)
main = sessions[‘agent:
Only clearing flags β origin.to still ‘heartbeat’
main.pop(‘pendingFinalDelivery’, None) main.pop(‘pendingFinalDeliveryText’, None) main.pop(‘pendingFinalDeliveryAttemptCount’, None) main.pop(‘pendingFinalDeliveryCreatedAt’, None)
origin.to still “heartbeat” β dispatch will re-trigger immediately
"
Correct approach: Delete the entire session entry, not just the pending flags.
Pitfall 2 β Not Restarting Gateway Before Session Cleanup
Problem: Modifying sessions.json while the gateway is running.
Why it fails: The gateway maintains an in-memory copy of sessions. File-system changes are overwritten on next session save.
Correct approach: bash
Always stop gateway first
sudo systemctl stop openclaw-gateway
Then modify sessions.json
Then restart
sudo systemctl start openclaw-gateway
Pitfall 3 β Misidentifying the Affected Session
Problem: Looking for the wrong session key.
Details: The main session key format is agent:<agent-id>:main. If you have multiple agents or a non-standard installation, the path may differ.
Verification: bash
List all session keys
python3 -c "
import json
with open(’/home/openclaw/.openclaw/agents/
Look for patterns like:
agent:orchestrator:main
agent:reasoner:main
agent:your-agent:main
Pitfall 4 β Test Environment vs Production State
Problem: Testing fix in a fresh session (which doesn’t deadlock) and concluding the fix works for existing stuck sessions.
Details: New sessions have origin: null, so they don’t trigger the dispatch retry loop regardless of fix status. The fix validation must occur on existing sessions that have accumulated heartbeat origin.
Correct validation approach:
- Apply fixes to gateway
- Stop gateway
- Manually inject the stuck state into a fresh session: set `origin.to = "heartbeat"` and `pendingFinalDelivery = true`
- Restart gateway and observe behavior
Pitfall 5 β Docker/Container Environment Path Differences
Problem: Assuming paths based on non-container documentation.
Details: bash
In Docker container, paths may be:
- Environment variable based: $OPENCLAW_HOME
- Default: /app/.openclaw
Find the correct path
docker exec
Or inspect environment
docker exec
Pitfall 6 β Heartbeat Interval Too Long for Testing
Problem: Setting 60-minute heartbeat interval and not waiting to verify fix.
Details: After applying fixes, wait at least 2 full heartbeat intervals to confirm no new pending state accumulation.
Test interval configuration: bash
Use short interval for testing
openclaw config set heartbeat.every “2m”
After verification, restore production interval
openclaw config set heartbeat.every “60m”
π Related Errors
- #59710 β Heartbeat silently stops after ~20h Same underlying cause: session state corruption preventing heartbeat execution. This issue's diagnostic mechanism identifies the root cause that #59710 only observed symptomatically.
- #78187 β Heartbeat polling silently stops after SIGUSR1 gateway restart Same symptom family: heartbeat scheduler running but no actual runs executing. Likely shares session state persistence issues.
- #74257 β HEARTBEAT_OK/internal text leak Inverse symptom of the same path: heartbeat output leaking to delivery channels when it should be suppressed. Confirms heartbeat token handling is inconsistent.
- #78532 (CLOSED 2026-05-07) β deliverySucceeded=true when no adapter invoked Sibling issue: same telemetry-vs-state mismatch family. Addressed success-side of dispatch; this issue addresses the failure-side.
- #55882 β Agent can drop promised outputs after task switching Broader pending-deliverables queue durability issue. The heartbeat deadlock is a specific case of the general pending-delivery state machine bug.
- #65498 (CLOSED) β Main-session user task can lose final reply after heartbeat or exec-completion interrupt Related fix area: session lifecycle management during concurrent heartbeat and user task execution.
Related Configuration Patterns
- `heartbeat: { every: "60m" }` with no `target`** The default `target: "none"` configuration combined with heartbeat creates the deadlock condition when heartbeat output is non-empty.
- `pendingFinalDelivery: true` + `origin.to: "heartbeat"`** The dangerous combination: pending flag set against a pseudo-target that no adapter resolves.
- `pendingFinalDeliveryAttemptCount` climbing without `pendingFinalDeliveryLastError`** Diagnostic signature for silent dispatch failures across the codebase.
Error Code Reference
| Code | Description | Connection |
|---|---|---|
STUCK_PENDING_DELIVERY | pendingFinalDelivery stuck >1h with null error | This issue’s proposed doctor check |
HEARTBEAT_DEATH_LOOP | Session blocks heartbeat indefinitely | Primary symptom |
DISPATCH_NO_ADAPTER | No channel adapter resolves target | Root cause (Bug B) |
DELIVERY_SILENT_FAILURE | Delivery fails without error capture | Root cause (Bug B-1) |