Telegram Forum: Message Sent to Wrong Topic After Anthropic 529 Retry
When Anthropic API returns 529 (overloaded) and OpenClaw retries the request, the reply message is sent without the correct message_thread_id, causing messages to disappear from forum topics.
๐ Symptoms
Primary Symptom
After an Anthropic API 529 retry cycle, the Telegram reply message is sent successfully (Telegram API returns ok and a valid message_id), but the message does not appear in the expected forum topic.
Log Evidence
2026-03-03T11:19:05.208Z [agent/embedded] embedded run agent end: runId=561c9fa1 isError=true error=The AI service is temporarily overloaded.
2026-03-03T11:19:05.685Z [agent/embedded] embedded run agent end: runId=81dab484 isError=true error=The AI service is temporarily overloaded.
2026-03-03T11:24:34.955Z [telegram] sendMessage ok chat=-1003885638534 message=13832
Diagnostic Symptoms
- No
thread not founderror โ Telegram did not reject the thread ID - No
message_thread_idin logs โ The debug output omits the thread parameter, obscuring diagnosis - 5-minute gap between last 529 error and sendMessage (indicates retry with backoff)
- Missing session entries โ No session records for
topic:562on the day of the incident - Stale-socket restart occurred 7 minutes after sendMessage but after the message was already lost
User-Facing Behavior
- The original message in topic 562 receives no response
- The response message ID exists in Telegram's database (confirmed by API response)
- The message is invisible in both the target topic AND the General topic
- The message appears to be "sent" but is effectively orphaned
๐ง Root Cause
Primary Failure: Thread Context Loss During Retry
The root cause is a context propagation failure in the retry pipeline. When Anthropic returns HTTP 529, the following sequence occurs:
- Message received โ OpenClaw receives a Telegram update containing
message.chat.id,message.message_thread_id: 562, and conversation context - API call initiated โ OpenClaw calls Anthropic's messages API with the conversation context
- 529 error received โ Anthropic returns
HTTP 529: The AI service is temporarily overloaded - Retry triggered โ OpenClaw's retry mechanism (with backoff) re-attempts the API call
- Context corruption โ During the retry cycle, the
message_thread_idfrom the original Telegram update is not carried forward to the sendMessage call
Architectural Issue: Session State vs. Inline Context
OpenClaw uses a session-based architecture where conversation context is stored in a session store. The critical bug occurs when:
// Simplified flow showing the failure point
async function handleUpdate(update) {
const threadId = update.message.message_thread_id; // 562 - captured here
// On first attempt, session is created/loaded
const session = await sessionStore.get(update.chat.id);
session.threadId = threadId;
await sessionStore.set(update.chat.id, session);
// ... API call made, 529 received ...
// On retry, session state may be stale or overwritten
const retrySession = await sessionStore.get(update.chat.id);
// retrySession.threadId could be undefined, null, or wrong value
// sendMessage called without correct thread_id
await telegram.sendMessage({
chat_id: update.chat.id,
text: response,
message_thread_id: retrySession.threadId // BUG: undefined!
});
}
Contributing Factors
- Retry delay creates race condition โ The 5-minute backoff between the 529 and the retry allows session state to be cleared, corrupted, or overwritten
- No thread_id in sendMessage logs โ The debug statement omits
message_thread_id, preventing early detection:// Current (broken) log format console.log(`sendMessage ok chat=${chatId} message=${messageId}`);// Missing: message_thread_id=${threadId || ‘undefined’}
- Session store TTL/expiry โ If sessions expire during the retry window, thread context is lost
- Concurrent message handling โ If another message arrives in a different topic during the retry, session state can be overwritten
Why No Error Is Raised
Telegram accepts the message without message_thread_id because it defaults to sending to the "Main Topic" (thread_id: 0). However, the Main Topic behavior in forum groups varies by client and Telegram version โ some clients hide these messages entirely if the original context was from a different thread.
๐ ๏ธ Step-by-Step Fix
Step 1: Ensure Thread ID is Passed to sendMessage
Modify the Telegram adapter to always include message_thread_id in the sendMessage payload, defaulting to the value from the incoming message if not available from session state:
// BEFORE (broken implementation)
async sendMessage(chatId, text, options = {}) {
const payload = {
chat_id: chatId,
text: text,
// message_thread_id not included - defaults to 0/undefined
...options
};
const result = await this.telegram.sendMessage(payload);
console.log(`sendMessage ok chat=${chatId} message=${result.message_id}`);
return result;
}
// AFTER (fixed implementation)
async sendMessage(chatId, text, options = {}) {
const payload = {
chat_id: chatId,
text: text,
parse_mode: 'Markdown',
...options
// message_thread_id MUST be passed explicitly in options
// No defaulting to undefined - caller is responsible
};
// Enhanced logging with thread_id
console.log(`sendMessage ok chat=${chatId} thread=${payload.message_thread_id ?? 'main'} message=${result.message_id}`);
return result;
}
Step 2: Preserve Thread ID Through Retry Cycles
Ensure the thread_id from the incoming message is carried through to the sendMessage call, regardless of session state:
// BEFORE (session-dependent)
async handleMessage(ctx, messageText) {
const session = await this.getSession(ctx.chat.id);
const response = await this.callAIWithRetry(messageText, session.context);
// Thread ID from session - may be stale after retry
await this.telegram.sendMessage(ctx.chat.id, response, {
message_thread_id: session.threadId
});
}
// AFTER (incoming message context preserved)
async handleMessage(ctx, messageText) {
// Capture thread_id from the ACTUAL incoming message, not session
const originalThreadId = ctx.message.message_thread_id;
const session = await this.getSession(ctx.chat.id);
const response = await this.callAIWithRetry(messageText, session.context);
// Always use the original message's thread_id
await this.telegram.sendMessage(ctx.chat.id, response, {
message_thread_id: originalThreadId
});
}
Step 3: Add Thread ID to All Send Operations
Ensure all Telegram send methods include thread_id when operating in a forum context:
// Helper to build send options with thread context
function buildSendOptions(originalMessage, overrides = {}) {
const options = { ...overrides };
// Always include thread_id if original message had one
if (originalMessage.message_thread_id) {
options.message_thread_id = originalMessage.message_thread_id;
}
return options;
}
// Usage
const sendOptions = buildSendOptions(ctx.message);
await this.telegram.sendMessage(ctx.chat.id, text, sendOptions);
await this.telegram.editMessageReplyMarkup(ctx.chat.id, messageId, sendOptions);
Step 4: Improve Retry Logging
Log the thread_id at each retry attempt to aid debugging:
async callAIWithRetry(message, context, threadId) {
const maxRetries = 3;
let lastError;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
console.log(`[retry] attempt=${attempt} thread=${threadId} maxRetries=${maxRetries}`);
try {
return await this.anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [{ role: 'user', content: message }],
extra_headers: { 'anthropic-dangerous-direct-browser-access': 'true' }
});
} catch (error) {
lastError = error;
if (error.status === 529) {
console.log(`[retry] received 529 (overloaded) thread=${threadId}`);
const backoffMs = Math.min(1000 * Math.pow(2, attempt), 30000);
console.log(`[retry] backing off for ${backoffMs}ms thread=${threadId}`);
await sleep(backoffMs);
} else if (error.status === 529) {
throw error; // Non-retryable error
}
}
}
throw lastError;
}
Step 5: Session State Locking (Advanced)
Prevent session state corruption during long retry cycles:
// Use optimistic locking for session updates
async updateSession(chatId, updater, threadId) {
const maxAttempts = 3;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
const session = await this.sessionStore.get(chatId);
const updated = updater(session);
// Preserve thread_id across session updates
updated.threadId = session.threadId || threadId;
try {
await this.sessionStore.set(chatId, updated);
return updated;
} catch (conflictError) {
if (attempt === maxAttempts) throw conflictError;
await sleep(50 * attempt); // Brief backoff
}
}
}
๐งช Verification
Step 1: Reproduce the 529 Scenario
Simulate an Anthropic 529 error to trigger the retry path:
# Using curl to simulate the Telegram update webhook
curl -X POST http://localhost:3000/webhook/telegram \
-H "Content-Type: application/json" \
-d '{
"update_id": 123456789,
"message": {
"message_id": 100,
"chat": { "id": -1003885638534, "type": "supergroup" },
"message_thread_id": 562,
"text": "Test message for 529 retry scenario"
}
}'
Step 2: Verify sendMessage Log Output
After applying the fix, confirm the log includes thread:
# Expected log output AFTER fix
2026-03-03T11:24:34.955Z [telegram] sendMessage ok chat=-1003885638534 thread=562 message=13832
# Should NOT see (before fix):
2026-03-03T11:24:34.955Z [telegram] sendMessage ok chat=-1003885638534 message=13832
Step 3: Verify Message Appears in Correct Topic
# Use Telegram's getMessage to verify thread placement
curl "https://api.telegram.org/bot${BOT_TOKEN}/getMessage?chat_id=-1003885638534&message_id=13832"
# Expected response includes:
{
"ok": true,
"result": {
"message_id": 13832,
"chat": { "id": -1003885638534, "type": "supergroup" },
"message_thread_id": 562, // <-- Must match original
"text": "..."
}
}
Step 4: Verify Session Contains Thread ID
# Check session store for correct thread_id
# (depends on session store implementation)
# If using Redis:
redis-cli GET "session:-1003885638534"
# Should contain: {"threadId": 562, "..."}
# If using file-based:
cat sessions/-1003885638534.json
# Should contain: {"threadId": 562, "..."}
Step 5: Unit Test for Thread Context Preservation
describe('Telegram forum thread context', () => {
it('should preserve message_thread_id through 529 retry', async () => {
const ctx = createMockContext({
chatId: -1003885638534,
messageId: 100,
threadId: 562,
text: 'Test message'
});
// Mock Anthropic to return 529 twice, then success
aiClient.messages.create
.mockRejectedValueOnce({ status: 529, message: 'overloaded' })
.mockRejectedValueOnce({ status: 529, message: 'overloaded' })
.mockResolvedValueOnce({ content: [{ type: 'text', text: 'Response' }] });
await handler.handleUpdate(ctx);
// Verify sendMessage was called with correct thread_id
expect(telegramAdapter.sendMessage).toHaveBeenCalledWith(
-1003885638534,
expect.any(String),
expect.objectContaining({ message_thread_id: 562 })
);
});
});
Step 6: Integration Test with Telegram Test Environment
# Use Telegram's test environment or a private bot
# Send message in a forum topic, trigger 529 error, verify reply location
# 1. Set BOT_TOKEN to test bot
export BOT_TOKEN="test_bot_token"
# 2. Run openclaw with logging
OPENCLAW_LOG_LEVEL=debug npm start
# 3. Monitor for:
# - sendMessage logs with thread=562
# - Message appears in correct topic
# - No "lost" messages
โ ๏ธ Common Pitfalls
Environment-Specific Traps
- Docker container restart clears session state
If OpenClaw runs in Docker and the container restarts during a long retry cycle, session state (including thread_id) is lost. Ensure session store is externalized (Redis) rather than in-memory.
# Docker Compose configuration - externalize session storage services: openclaw: image: openclaw:latest environment: - SESSION_STORE=redis - REDIS_URL=redis://redis:6379 redis: image: redis:7-alpine volumes: - redis-data:/data volumes: redis-data: - macOS file descriptor limits
When using file-based sessions on macOS, the default ulimit can cause session write failures during high load:
# Check current limit ulimit -n # Increase if below 1024 ulimit -n 65535 - Windows path separators in session keys
Session store file paths may have issues on Windows with special characters in chat IDs (leading hyphen):
# Use encodeURIComponent for chat IDs in file paths const sessionPath = path.join( sessionDir, `${encodeURIComponent(String(chatId))}.json` );
Configuration Pitfalls
- Forgetting to enable forum support in BotFather
Telegram bots require explicit group_membership permission for forum topics:
# Required BotFather commands: # /setprivacy -> Disable (for forum access) # /setjoingroup -> Yes # /setforums -> Enable (if available) - Mismatched session TTL vs retry backoff
If session TTL is shorter than the retry backoff period, thread context expires:
# Example: 5-minute TTL but 5-minute backoff = guaranteed context loss SESSION_TTL=300000 # 5 minutes in ms MAX_RETRY_BACKOFF=300000 # Should be less than TTL - Using
reply_to_message_idwithoutmessage_thread_idEven with reply_to_message_id set correctly, omitting message_thread_id causes forum messages to be lost:
# BROKEN: reply without thread context { chat_id: -1003885638534, text: "Reply text", reply_to_message_id: 100 // Missing: message_thread_id: 562 }CORRECT: include both
{ chat_id: -1003885638534, text: “Reply text”, reply_to_message_id: 100, message_thread_id: 562 }
Code-Level Pitfalls
- Storing thread_id as string vs number
Telegram API accepts both but mixing types causes issues:
# Telegram API is flexible but some clients expect integer const threadId = parseInt(message.message_thread_id, 10); # Or ensure consistent type const threadId = String(message.message_thread_id); - Overwriting session in concurrent handlers
If multiple messages arrive simultaneously for the same chat, session writes can race:
// PROBLEMATIC: Read-modify-write without atomicity const session = await getSession(chatId); session.threadId = threadId; // Read await saveSession(chatId, session); // Write - another request may overwrite// FIXED: Use atomic operations or locking await updateSessionAtomic(chatId, (s) => { s.threadId = threadId; return s; });
- Async/await race conditions in retry handlers
The callback/promise chain can lose context:
// PROBLEMATIC function handleMessage(ctx) { let threadId = ctx.message.message_thread_id;retry(3, () => ai.call()).then(response => { // ’this’ and ’threadId’ may be out of scope or stale sendMessage(ctx.chat.id, response, { threadId }); });
// New message arrives, ’threadId’ is overwritten threadId = newMessage.message_thread_id; }
// FIXED: Capture context in closure function handleMessage(ctx) { const threadId = ctx.message.message_thread_id; // Capture immediately
retry(3, () => ai.call()).then(response => { sendMessage(ctx.chat.id, response, { message_thread_id: threadId }); }); }
๐ Related Errors
HTTP 529: The AI service is temporarily overloadedAnthropic's rate limiting error that triggers the retry sequence. The 529 error is the initiating event for the bug.
stale-socketGateway health monitor restart that occurred 7 minutes after the lost message. Not directly related but indicates underlying connection instability that may exacerbate retry issues.
thread not found(not seen in this case)
Telegram API error when message_thread_id refers to a non-existent topic. The absence of this error confirms Telegram received a valid thread ID (or no thread ID at all).- Session expiration during long-running operations
Related to GitHub issue where session TTL is shorter than operation duration, causing context loss. Similar root cause to the thread_id loss.
- Webhook delivery failure with missing thread context
Related issue where Telegram webhook updates arrive without message_thread_id for forum messages, causing routing failures.
context.lengthExceededAnthropic errorWhen conversation context grows too large during retries, Anthropic returns this error. Can compound thread context issues if error handling loses state.
- Race condition in concurrent Telegram updates
When multiple updates arrive simultaneously for the same chat, session state can be overwritten, losing thread_id. Same architectural vulnerability as this bug.
- Message sent to wrong chat after bot token refresh
Related session/context loss scenario where bot configuration changes mid-operation cause messages to route incorrectly.