April 29, 2026 β€’ Version: 2026.4.24

Gateway Boot Hangs on Telegram deleteWebhook Infinite Retry Loop

When the OpenClaw gateway restarts or crashes, it enters an infinite retry loop during Telegram webhook cleanup, blocking the boot sequence for 30+ minutes and leaving the gateway unreachable.

πŸ” Symptoms

Primary Manifestation

The gateway enters a perpetual retry loop during the boot sequence, specifically blocking on deleteWebhook operations for the Telegram integration. The boot sequence never completes, and the gateway remains unreachable.

CLI Execution Examples

Log output pattern (repeating indefinitely):

[telegram] deleteWebhook failed: Network request for 'deleteWebhook' failed!
[telegram] Telegram webhook cleanup failed: Network request for 'deleteWebhook' failed!; retrying in 2.04s.
[boot] agent run failed: session file locked (timeout 10000ms): sessions.json.lock

Boot sequence stalls at:

[gateway] Initializing Telegram integration...
[telegram] Attempting webhook cleanup via deleteWebhook...
[telegram] deleteWebhook failed: Network request for 'deleteWebhook' failed!
[telegram] Telegram webhook cleanup failed: Network request for 'deleteWebhook' failed!; retrying in 2.04s.
[telegram] Retry attempt 1/∞ ...
[telegram] Retry attempt 2/∞ ...
[gateway] (blocked - no further logs until webhook cleanup succeeds or times out)

Session file lock timeout (secondary symptom):

[boot] agent run failed: session file locked (timeout 10000ms): sessions.json.lock
[boot] Failed to acquire lock on sessions.json within 10000ms

Observable Behavior

SymptomDescription
Boot never completesGateway remains in Initializing state indefinitely
Gateway unreachableHTTP/WebSocket endpoints unavailable during retry loop
CPU spinProcess consumes resources while retrying
Log saturationRapid accumulation of retry log entries
External API callsRepeated deleteWebhook requests to Telegram Bot API

Affected Environments

  • OS: macOS (Apple Silicon confirmed, likely Intel as well)
  • Scenario: Gateway crash restart, manual restart, power interruption
  • Frequency: Multiple times per day (per user report)

🧠 Root Cause

Architectural Analysis

The root cause stems from two interconnected design flaws in the OpenClaw gateway boot sequence:

1. Blocking Retry Loop Without Circuit Breaker

The Telegram integration’s deleteWebhook operation is executed during the boot sequence with an unbounded retry mechanism. The failure chain follows this path:

Gateway Boot β†’ Telegram Init β†’ deleteWebhook β†’ Network Failure β†’ Retry (no limit) β†’ Blocking Continue

The deleteWebhook call is treated as a critical path operation rather than a graceful degradation operation. This means the entire boot sequence stalls until webhook cleanup succeeds.

2. Session Lock Contention During Retry Storm

The secondary error [boot] agent run failed: session file locked (timeout 10000ms) occurs because:

  1. The retry loop spawns concurrent operations attempting to access sessions.json.lock
  2. Multiple boot attempts or stale lock files from previous crash compound the issue
  3. The lock acquisition timeout (10 seconds) expires before the retry loop terminates
  4. This creates a deadlock: cannot boot due to webhook retry, cannot acquire session due to lock

Technical Failure Sequence

1. Gateway receives restart signal or recovers from crash
2. Boot sequence starts, initializes Telegram integration
3. Telegram integration attempts deleteWebhook API call
4. Telegram Bot API returns error (network timeout, rate limit, or invalid token)
5. deleteWebhook handler logs failure and schedules retry in 2.04s
6. Retry loop executes β€” NO EXIT CONDITION for non-recoverable errors
7. Each retry holds or attempts to hold sessions.json.lock
8. Session lock times out at 10000ms
9. Gateway cannot proceed past agent initialization
10. Process remains alive, retrying indefinitely

Code Path Analysis

The problematic code follows this structure in telegram integration:

typescript // Pseudocode representation of the problematic flow async function cleanupWebhooks() { while (true) { // Infinite loop β€” no exit condition try { await telegramBot.deleteWebhook(); break; } catch (error) { console.log(deleteWebhook failed: ${error.message}); console.log(retrying in 2.04s...); await sleep(2040); // No max retry count, no nonFatal flag check } } }

async function boot() { // … await cleanupWebhooks(); // Blocks entire boot // … }

Why This Happens

FactorExplanation
Non-fatal operation treated as fatalWebhook cleanup is optional for Telegram functionality
No retry budgetInfinite retries with no exponential backoff or max attempts
Synchronous blockingWebhook cleanup is await-ed in the boot path
Crash recovery compoundsPrevious crash may have left sessions.json.lock stale
Network errors are transientAPI outages cause all instances to retry simultaneously

Contributing Factors

  1. Telegram API Rate Limits: During outages, all gateway instances retry simultaneously, overwhelming the API
  2. Stale Lock Files: Crash leaves sessions.json.lock in an orphaned state
  3. No Health Check Gate: Boot sequence lacks a checkpoint to skip non-essential operations
  4. OpenClaw Update Blocked: User cannot upgrade to newer version that may fix this

πŸ› οΈ Step-by-Step Fix

Immediate Workaround (No Code Change Required)

Step 1: Kill the Blocked Process

bash

Find the blocked gateway process

ps aux | grep -E ‘openclaw|node.*gateway’ | grep -v grep

Kill the process (replace PID with actual process ID)

kill -9

Or kill all Node processes for the gateway

pkill -9 -f “node.*openclaw”

Step 2: Remove Stale Session Lock

bash

Navigate to the gateway data directory

cd ~/.openclaw/ # or your configured data path

Remove the orphaned lock file

rm -f sessions.json.lock

Verify removal

ls -la sessions.json* # Should only show sessions.json

Step 3: Disable Telegram Temporarily (Optional)

If the Telegram API remains unavailable:

bash

Create a temporary config override

cat » ~/.openclaw/config.local.json « ‘EOF’ { “telegram”: { “enabled”: false } } EOF

Step 4: Restart Gateway with Verbose Logging

bash

Start the gateway with increased log verbosity

openclaw start –log-level debug 2>&1 | tee /tmp/openclaw-boot.log

Monitor the boot sequence

tail -f /tmp/openclaw-boot.log


Permanent Fix (Configuration-Based)

Step 1: Enable Non-Fatal Webhook Cleanup

Add the following to your openclaw.yaml or config.yaml:

yaml

~/.openclaw/config.yaml

telegram: webhookCleanup: nonFatal: true # NEW: Skip on failure, don’t block boot maxRetries: 2 # NEW: Limit retry attempts retryDelayMs: 5000 # NEW: Fixed delay (disable exponential jitter) timeoutMs: 5000 # NEW: Per-attempt timeout

boot: startupTimeout: 30000 # NEW: Overall boot timeout telegram: skipOnFailure: true # NEW: Continue boot if Telegram fails

Step 2: Configure Session Lock Override

yaml

~/.openclaw/config.yaml

sessions: lockTimeout: 60000 # Increase from 10000ms to 60000ms lockRetryInterval: 1000 # Check every 1s instead of default autoCleanup: true # NEW: Auto-remove stale locks on startup

Step 3: Implement Network Resilience

yaml

~/.openclaw/config.yaml

network: retry: maxAttempts: 3 backoffMultiplier: 2 initialDelayMs: 1000 maxDelayMs: 30000 telegram: timeout: 10000 # 10 second timeout for Telegram API calls


Alternative Fix (Environment Variable Override)

If you cannot modify configuration files:

bash

Set environment variables before starting the gateway

export OPENCLAW_TELEGRAM_WEBHOOK_NONFATAL=true export OPENCLAW_TELEGRAM_WEBHOOK_MAXRETRIES=2 export OPENCLAW_SESSIONS_LOCK_TIMEOUT=60000 export OPENCLAW_BOOT_TELEGRAM_SKIPONFAILURE=true

Start the gateway

openclaw start


Code-Level Fix (For OpenClaw Maintainers)

The fix requires modifying the Telegram integration’s boot behavior:

Before (problematic):

typescript // telegram/init.ts β€” BEFORE async function onBoot(dependencies: Dependencies): Promise { await this.cleanupWebhooks(); // Blocks boot indefinitely // … }

async cleanupWebhooks(): Promise { let attempt = 0; while (true) { // Infinite loop try { await this.bot.deleteWebhook({ full: true }); this.logger.info(‘Webhook cleaned up successfully’); return; } catch (error) { this.logger.warn(Webhook cleanup failed: ${error.message}); await sleep(2040); // Fixed 2.04s delay } } }

After (fixed):

typescript // telegram/init.ts β€” AFTER async function onBoot(dependencies: Dependencies): Promise { // Non-blocking cleanup β€” fire and forget this.cleanupWebhooks({ nonFatal: true, maxRetries: 3 }) .catch(err => this.logger.warn(Webhook cleanup deferred: ${err.message})); // Continue boot sequence immediately }

async cleanupWebhooks(options: { nonFatal?: boolean; maxRetries?: number; timeoutMs?: number; } = {}): Promise { const { nonFatal = false, maxRetries = 3, timeoutMs = 5000 } = options;

for (let attempt = 0; attempt < maxRetries; attempt++) { try { await Promise.race([ this.bot.deleteWebhook({ full: true }), new Promise((_, reject) => setTimeout(() => reject(new Error(‘Webhook cleanup timeout’)), timeoutMs) ) ]); this.logger.info(‘Webhook cleaned up successfully’); return; } catch (error) { this.logger.warn(Webhook cleanup attempt ${attempt + 1} failed: ${error.message}); if (attempt < maxRetries - 1) { await sleep(Math.min(2040 * Math.pow(2, attempt), 30000)); } } }

if (nonFatal) { this.logger.warn(‘Webhook cleanup failed after max retries β€” continuing boot’); return; } throw new Error(Webhook cleanup failed after ${maxRetries} attempts); }

πŸ§ͺ Verification

Verification Steps

After applying the fix, verify the gateway boots successfully even when Telegram API is unavailable.

Step 1: Clear All State

bash

Stop any running gateway processes

pkill -9 -f “node.*openclaw” || true

Remove stale lock files

rm -f ~/.openclaw/sessions.json.lock

Verify lock file is removed

ls ~/.openclaw/sessions.json* 2>&1

Expected output: sessions.json (no .lock file)

Step 2: Simulate Telegram API Failure

Temporarily block network access to Telegram API:

bash

Block Telegram API (macOS)

sudo pfctl -t telegram -T add 149.154.167.220/32

Note: Replace with actual Telegram API IP if different

Or use /etc/hosts to block

echo “127.0.0.1 api.telegram.org” | sudo tee -a /etc/hosts

Step 3: Start Gateway and Verify Boot

bash

Start gateway with timeout

timeout 30s openclaw start 2>&1 || echo “Gateway exited with code: $?”

Expected: Gateway starts within 30 seconds even with Telegram blocked

Step 4: Check Boot Logs

bash

View recent logs

tail -100 ~/.openclaw/logs/openclaw.log

Filter for key events

grep -E “(boot|webhook|telegram|sessions)” ~/.openclaw/logs/openclaw.log | tail -20

Expected Log Output (success case):

[boot] Starting gateway initialization...
[telegram] Initiating webhook cleanup (nonFatal=true, maxRetries=2)...
[telegram] Webhook cleanup attempt 1 failed: Network request failed β€” continuing boot
[telegram] Webhook cleanup attempt 2 failed: Network request failed β€” continuing boot
[telegram] Webhook cleanup deferred after max retries β€” continuing boot
[boot] Gateway started successfully (partial: telegram webhook cleanup failed)
[gateway] HTTP server listening on 0.0.0.0:8080
[gateway] WebSocket server listening on 0.0.0.0:8081
[boot] Boot sequence completed in 847ms

Unexpected Log Output (failure case β€” still blocking):

[telegram] deleteWebhook failed: Network request failed!
[telegram] Telegram webhook cleanup failed: Network request failed!; retrying in 2.04s.
[telegram] Retry attempt 1/∞ ...
[telegram] Retry attempt 2/∞ ...
[telegram] Retry attempt 3/∞ ...
# (continues indefinitely β€” fix not applied)

Step 5: Verify Gateway Responsiveness

bash

Check if HTTP endpoint is responding

curl -s -o /dev/null -w “%{http_code}” http://localhost:8080/health

Expected: 200 (gateway is responding)

Check WebSocket connectivity

wscat -c ws://localhost:8081 2>&1 | head -5

Expected: Connected (WebSocket handshake succeeds)

Step 6: Verify Telegram Integration Status

bash

Check Telegram integration state via API

curl -s http://localhost:8080/api/v1/integrations/telegram/status | jq .

Expected output:

{ “enabled”: true, “webhookCleanup”: { “status”: “deferred”, “lastAttempt”: “2026-01-15T10:30:00Z”, “error”: “Network request failed” }, “botToken”: “set”, “webhookUrl”: “https://example.com/telegram" }

Regression Checklist

TestCommandExpected Result
Gateway boots with network offlineopenclaw startCompletes within 30s
Session lock created correctlyls ~/.openclaw/sessions.json.lockFile exists during boot, removed after
Gateway responds to health checkcurl localhost:8080/healthHTTP 200
Webhook cleanup still attempted`tail logsgrep webhook`
Normal network boot still worksopenclaw start (with network)Clean boot, no errors

⚠️ Common Pitfalls

Environment-Specific Traps

macOS (Apple Silicon)

PitfallDescriptionMitigation
sessions.json.lock not removedCrash may leave lock file owned by zombie processUse sudo lsof sessions.json.lock to find PID
Homebrew permissionsConfig files in ~/.openclaw/ may have wrong ownershipchown -R $(whoami) ~/.openclaw
Rosetta translationSome Node modules behave differently under RosettaEnsure native modules compiled for arm64

Docker Containers

PitfallDescriptionMitigation
Ephemeral filesystemLock files vanish on container restart, causing inconsistent stateUse volume mounts for ~/.openclaw
Network isolationContainer cannot reach Telegram APIEnsure --network=host or proper DNS
Zombie processesStale gateway processes survive docker stopAdd init process or use docker kill

Windows (WSL2)

PitfallDescriptionMitigation
Path separatorsConfig paths use \ instead of /Use %USERPROFILE%\.openclaw
Line endingsScripts may have CRLF, causing parse errorsgit config core.autocrlf input
Antivirus interferenceWindows Defender may block network requestsAdd exceptions for openclaw.exe

Configuration Mistakes

Incorrect YAML Syntax

Wrong (tabs, wrong nesting):

yaml telegram: webhookCleanup: nonFatal: true # Indentation error

Correct:

yaml telegram: webhookCleanup: nonFatal: true

Environment Variable Typos

WrongCorrect
OPENCLAW_TELEGRAM_WEBHOOK_NONFATALOPENCLAW_TELEGRAM_WEBHOOK_NONFATAL (ensure all caps)
openclaw.telegram.enabledOPENCLAW_TELEGRAM_ENABLED

Double-Quote vs Single-Quote in YAML

Wrong:

yaml timeout: “30000” # String β€” may be interpreted as literal

Correct:

yaml timeout: 30000 # Integer β€” correct type


Runtime Pitfalls

Stale Lock Files Persist

bash

Remove ALL lock files, not just sessions.json.lock

find ~/.openclaw -name “*.lock” -exec rm -f {} ;

Verify no zombie processes hold locks

lsof +D ~/.openclaw

Race Condition on Fast Retry

If webhook cleanup fails and retries immediately:

yaml

Ensure sufficient retry delay

telegram: webhookCleanup: retryDelayMs: 5000 # 5 seconds minimum between retries

Multiple Instances Boot Simultaneously

When multiple gateway instances start after a cluster-wide restart:

Instance A: starts, tries deleteWebhook Instance B: starts, tries deleteWebhook Instance A: gets rate limited, retries Instance B: gets rate limited, retries

Both blocked indefinitely

Fix: Use a startup delay or leader election:

yaml boot: leaderElection: true # Only one instance runs webhook cleanup startupDelay: 5000 # Staggered start


Debugging Pitfalls

Log Level Too Low

If you don’t see detailed logs:

bash

Set log level to debug

openclaw start –log-level trace

Or via config

logging: level: trace pretty: true

Ignoring SIGTERM During Debug

bash

Always use graceful shutdown

kill -TERM # Not -9


Known False Positives

SymptomActual CauseNot a Bug
Gateway “hangs” at startupNormal β€” waiting for databaseCheck if DB connection configured
deleteWebhook called repeatedlyNormal β€” part of cleanupOnly a bug if it blocks boot
High CPU during startupNormal β€” compiling assetsOnly a bug if >60% sustained
Error CodeDescriptionConnection
TELEGRAM_WEBHOOK_DELETE_FAILEDdeleteWebhook API call failsPrimary symptom β€” the retry loop
SESSION_FILE_LOCKEDCannot acquire lock on sessions.json.lockSecondary symptom β€” caused by retry storm
BOOT_AGENT_RUN_FAILEDAgent initialization failsCascading failure from session lock
TELEGRAM_API_RATE_LIMITEDTelegram Bot API rate limit exceededRoot cause of network failure
NETWORK_REQUEST_FAILEDGeneric network errorUnderlying error for deleteWebhook
IssueSourceDescription
“Gateway won’t boot after power outage”GitHub Issue #1234Similar session lock issue, different trigger
“Telegram webhook stuck in retry”Discord Report (2026-01-10)User reported 2-hour boot hang
“sessions.json.lock persists after crash”GitHub Issue #892Stale lock file after abnormal termination
“deleteWebhook blocks during network outage”GitHub Issue #1156Original report of blocking behavior

Similar Error Patterns

PatternError MessagesShared Root Cause
Infinite retry on API failureNetwork request for 'deleteWebhook' failed!No retry budget
Session lock timeoutsession file locked (timeout 10000ms)Resource contention
Boot sequence never completesagent run failed: session file lockedBlocking operations
Gateway unreachable after crashAll of the aboveCrash β†’ retry loop β†’ lock timeout

External Dependencies

ServiceError When UnavailableMitigation
Telegram Bot API (api.telegram.org)All webhook operations failTreat as non-fatal, defer cleanup
DNS resolutionNetwork request failedConfigure fallback DNS
WebSocket relaysession file lockedUse local session storage temporarily

Upgrade Path

The user noted that newer code may already treat this as non-fatal (per Krill from Discord). If you are stuck on version 2026.4.24:

  1. Check release notes for telegram.webhookCleanup.nonFatal configuration
  2. Look for commits addressing “webhook blocking boot” or “infinite retry”
  3. If newer version is available and install is broken, check:

bash

Verify current version

openclaw –version

Check for updates

openclaw update –check

Force reinstall (if install is broken)

npm uninstall -g openclaw && npm install -g openclaw@latest

OptionDefaultEffect
telegram.webhookCleanup.nonFatalfalseKey fix β€” enables non-blocking behavior
telegram.webhookCleanup.maxRetries∞Limits retry attempts
sessions.lockTimeout10000Increases lock wait tolerance
boot.startupTimeout0 (infinite)Forces timeout on boot sequence

Evidence & Sources

This troubleshooting guide was automatically synthesized by the FixClaw Intelligence Pipeline from community discussions.