Billing Cooldown Skip Path Bypasses isBillingErrorMessage() โ Generic Error Shown Instead of Billing Message
When all model-fallback candidates are skipped due to billing cooldown, users see a generic 'Something went wrong' error instead of the actionable BILLING_ERROR_USER_MESSAGE, because the cooldown-generated skip message does not match any pattern in isBillingErrorMessage().
๐ Symptoms
User-Facing Symptom
After the first billing failure with an Anthropic OAuth-authenticated account, all subsequent retry attempts display a generic, non-actionable error message:
โ ๏ธ Something went wrong while processing your request. Please try again, or use /new to start a fresh session.
This repeats on a ~30-minute cadence for hours, even though the underlying cause is a billing quota exhaustion on the Anthropic side.
Developer-Visible Symptom (Agent Logs)
The first failure correctly surfaces the billing error:
[agent] embedded run agent end: runId=e8520f5d-... isError=true model=claude-opus-4-6 provider=anthropic error=LLM request rejected: You're out of extra usage. Add more at claude.ai/settings/usage and keep going.
[agent] auth profile failure state updated: runId=e8520f5d-... profile=sha256:154a23a3efe6 provider=anthropic reason=billing window=disabledAll subsequent failures produce the cooldown skip path:
[model-fallback] model fallback decision: decision=skip_candidate requested=anthropic/claude-opus-4-6 candidate=anthropic/claude-opus-4-6 reason=billing next=anthropic/claude-sonnet-4-6 detail=Provider anthropic has billing issue (skipping all models)
[model-fallback] model fallback decision: decision=skip_candidate requested=anthropic/claude-opus-4-6 candidate=anthropic/claude-sonnet-4-6 reason=billing next=none detail=Provider anthropic has billing issue (skipping all models)
Embedded agent failed before reply: All models failed (2): anthropic/claude-opus-4-6: Provider anthropic has billing issue (skipping all models) (billing) | anthropic/claude-sonnet-4-6: Provider anthropic has billing issue (skipping all models) (billing)Structured Data Confirmation
The FallbackSummaryError carries attempt.reason=“billing” for every attempt, but the isBillingErrorMessage() check at agent-runner-execution.ts performs string-matching against ERROR_PATTERNS.billing in failover-matches.ts, which does not include the pattern “has billing issue”.
Frequency Pattern
The cycle repeats every 30 minutes for extended durations:
2026-04-13T22:41:05 ... Embedded agent failed before reply: All models failed (2): ... (billing) | ... (billing)
2026-04-13T23:11:05 ... Embedded agent failed before reply: All models failed (2): ... (billing) | ... (billing)
2026-04-13T23:41:05 ... Embedded agent failed before reply: All models failed (2): ... (billing) | ... (billing)๐ง Root Cause
Architecture: Two Error Classification Strategies
OpenClaw uses two distinct strategies to classify errors as billing-related, with an asymmetry between the raw API error path and the cooldown skip path:
- Raw API error path:
isBillingErrorMessage(message: string)โ regex/string matching againstERROR_PATTERNS.billinginfailover-matches.ts. - Rate-limit path (already correct):
isPureTransientRateLimitSummary(failure: FallbackSummaryError)โ structural check againstattempt.reason === 'rate_limit'. - Billing cooldown skip path (broken): No equivalent structural check; relies solely on
isBillingErrorMessage()string-matching, which fails to match"has billing issue (skipping all models)".
The Failure Sequence
- User's Anthropic OAuth personal "extra usage" quota is exhausted.
- First LLM request receives a raw API error:
400 {"type":"error","error":{"type":"invalid_request_error","message":"You're out of extra usage. Add more at claude.ai/settings/usage and keep going."}} - The raw error message matches
ERROR_PATTERNS.billingโauth profileentersbilling cooldown window=disabled. - Subsequent requests trigger the model-fallback skip logic in
model-fallback.ts. - Every candidate model is skipped with detail:
"Provider anthropic has billing issue (skipping all models)". - A
FallbackSummaryErroris constructed withattempt.reason="billing"for each failed attempt. - In
agent-runner-execution.ts, the code callsisBillingErrorMessage(error.message)โ but"has billing issue"is not inERROR_PATTERNS.billing. - The billing check fails, so the generic fallback error path is taken, producing
"Something went wrong".
Relevant Code Locations
src/core/failover/failover-matches.tsโERROR_PATTERNS.billingcontains patterns like"out of extra usage","insufficient balance","billing error", but **not**"has billing issue".src/core/agent-runner-execution.tsโ callsisBillingErrorMessage(message)as the sole billing classification gate for the error rendering path.src/core/model-fallback/model-fallback.tsโ produces"Provider X has billing issue (skipping all models)"messages when billing cooldown is active.src/core/failover/failover-matches.tsโisPureTransientRateLimitSummary()correctly inspectsattempt.reason === 'rate_limit'as a structural field, demonstrating the correct pattern that the billing path is missing.
Why the Rate-Limit Path Works Correctly
The rate-limit path already uses the structural attempt.reason field:
export function isPureTransientRateLimitSummary(failure: FallbackSummaryError): boolean {
return failure.attempts.every(a => a.reason === 'rate_limit');
}This approach is immune to message string changes because it inspects the semantic classification field, not the human-readable message.
Why OAuth Exacerbates the Issue
Personal “extra usage” quotas on claude.ai are generally smaller than organizational API budgets. OAuth-authenticated accounts hit their personal quota more frequently than API-key-authenticated org accounts, making this bug a common user experience issue for the OAuth install path.
๐ ๏ธ Step-by-Step Fix
Fix Strategy
Add a structural billing check function isPureBillingSummary() to failover-matches.ts, mirroring the existing isPureTransientRateLimitSummary() pattern. Update agent-runner-execution.ts to use this structural check as the primary gate for billing classification, falling back to string-matching only for legacy raw API errors.
Step 1: Add Structural Billing Check to failover-matches.ts
Add the following function to src/core/failover/failover-matches.ts alongside the existing isPureTransientRateLimitSummary():
Before:
export function isPureTransientRateLimitSummary(failure: FallbackSummaryError): boolean {
return failure.attempts.every(a => a.reason === 'rate_limit');
}After:
export function isPureTransientRateLimitSummary(failure: FallbackSummaryError): boolean {
return failure.attempts.every(a => a.reason === 'rate_limit');
}
/**
* Structural check: true when every attempt in the FallbackSummaryError
* is classified as billing-cooldown. This correctly handles the
* "Provider X has billing issue (skipping all models)" skip path,
* which is not matched by isBillingErrorMessage() string patterns.
*/
export function isPureBillingSummary(failure: FallbackSummaryError): boolean {
return failure.attempts.every(a => a.reason === 'billing');
}Step 2: Update agent-runner-execution.ts Billing Classification Gate
Locate the billing error classification logic in src/core/agent-runner-execution.ts. Replace the string-only check with a structural-first approach:
Before:
const isBilling = isBillingErrorMessage(message);After:
// Prefer structural classification (cooldown skip path) over string matching.
const isBilling = error instanceof FallbackSummaryError
? isPureBillingSummary(error)
: isBillingErrorMessage(message);Ensure FallbackSummaryError and isPureBillingSummary are imported:
import { FallbackSummaryError } from '../model-fallback/types';
import { isPureBillingSummary } from '../failover/failover-matches';Step 3: (Optional Enhancement) Extend ERROR_PATTERNS.billing
To ensure raw API errors through the cooldown window are also handled gracefully, extend the billing patterns in src/core/failover/failover-matches.ts to include the cooldown skip phrase:
Before:
export const ERROR_PATTERNS = {
billing: [
/out of extra usage/i,
/insufficient balance/i,
/billing error/i,
/api key (has|runs out).*credit/i,
/add more at.*usage/i,
/out of credits/i,
],
// ...
};After:
export const ERROR_PATTERNS = {
billing: [
/out of extra usage/i,
/insufficient balance/i,
/billing error/i,
/api key (has|runs out).*credit/i,
/add more at.*usage/i,
/out of credits/i,
/has billing issue \(skipping all models\)/i, // cooldown skip path
],
// ...
};This third step is defensive; the primary fix in Steps 1โ2 is sufficient because isPureBillingSummary() short-circuits before string matching reaches the FallbackSummaryError case.
Step 4: Rebuild and Deploy
npm run build
# or for Docker deployments:
docker build -t openclaw:fixed .๐งช Verification
Unit Test: isPureBillingSummary()
Add a test case to src/core/failover/failover-matches.test.ts:
import { isPureBillingSummary } from './failover-matches';
import { FallbackSummaryError, FallbackAttempt } from '../model-fallback/types';
describe('isPureBillingSummary', () => {
it('returns true when all attempts have reason=billing', () => {
const attempts: FallbackAttempt[] = [
{
provider: 'anthropic',
model: 'claude-opus-4-6',
reason: 'billing',
message: 'Provider anthropic has billing issue (skipping all models)',
durationMs: 0,
startTime: 0,
endTime: 0,
},
{
provider: 'anthropic',
model: 'claude-sonnet-4-6',
reason: 'billing',
message: 'Provider anthropic has billing issue (skipping all models)',
durationMs: 0,
startTime: 0,
endTime: 0,
},
];
const error = new FallbackSummaryError('All models failed', attempts);
expect(isPureBillingSummary(error)).toBe(true);
});
it('returns false when attempts contain mixed reasons', () => {
const attempts: FallbackAttempt[] = [
{ provider: 'anthropic', model: 'claude-opus-4-6', reason: 'billing', message: '', durationMs: 0, startTime: 0, endTime: 0 },
{ provider: 'anthropic', model: 'claude-sonnet-4-6', reason: 'rate_limit', message: '', durationMs: 0, startTime: 0, endTime: 0 },
];
const error = new FallbackSummaryError('All models failed', attempts);
expect(isPureBillingSummary(error)).toBe(false);
});
it('returns false when no attempt has reason=billing', () => {
const attempts: FallbackAttempt[] = [
{ provider: 'anthropic', model: 'claude-opus-4-6', reason: 'rate_limit', message: '', durationMs: 0, startTime: 0, endTime: 0 },
];
const error = new FallbackSummaryError('All models failed', attempts);
expect(isPureBillingSummary(error)).toBe(false);
});
});Run the test suite:
npm test -- --testPathPattern="failover-matches"
# Expected: isPureBillingSummary tests passIntegration Test: Billing Cooldown Error Rendering
Simulate a billing cooldown scenario using a test provider or mocked AuthProfile:
# Using the OpenClaw CLI test harness (if available):
openclaw test:integration --scenario=billing-cooldown --auth-type=oauth
# Expected output in user-facing message channel:
# "โ ๏ธ API provider returned a billing error โ your API key has run out of credits
# or has an insufficient balance. Check your provider's billing dashboard and
# top up or switch to a different API key."
# (i.e., BILLING_ERROR_USER_MESSAGE, not "Something went wrong")Manual Verification: Log Inspection
Trigger the billing cooldown and inspect agent logs for the corrected classification:
# Trigger a billing exhaustion scenario, then observe subsequent failures:
grep -E "(billing|isBilling|Something went wrong)" /var/log/openclaw/agent.log
# Before fix โ "Something went wrong" appears repeatedly:
# Embedded agent failed before reply: ... (Something went wrong)
# Embedded agent failed before reply: ... (Something went wrong)
# After fix โ BILLING_ERROR_USER_MESSAGE appears:
# [agent] embedded run agent end: ... userMessage=โ ๏ธ API provider returned a billing error...
# Embedded agent failed before reply: ... (billing)Exit Code Verification
# Verify graceful degradation with billing error exit code
openclaw run --prompt="Hello" --model=anthropic/claude-opus-4-6
echo "Exit code: $?"
# Expected: non-zero exit (indicating error state was properly surfaced), NOT a crashโ ๏ธ Common Pitfalls
- Only extending ERROR_PATTERNS without adding isPureBillingSummary(): Adding
"has billing issue"to the string patterns works as a workaround but is fragile. If the model-fallback message format changes in a future release (e.g., "Provider X billing cooldown โ skipping all models"), the pattern breaks again. The structuralisPureBillingSummary()approach is resilient to message string changes. - Applying isPureBillingSummary() unconditionally: The check must be guarded with
instanceof FallbackSummaryError. Calling it on a raw string or other error type throws a TypeError. The fallback toisBillingErrorMessage(message)for non-FallbackSummaryErrortypes preserves backward compatibility with raw API errors. - OAuth vs API key asymmetry in testing: The bug manifests more readily with OAuth-authenticated accounts because personal "extra usage" quotas are smaller. Testing with org-level API keys may not reproduce the issue, leading to false confidence that a fix is working. Always test with both OAuth personal quota scenarios and API key exhaustion scenarios.
- Cooldown window state persistence: The billing cooldown state persists across restarts if backed by a persistent store (Redis, SQLite). Ensure test environments reset the auth profile failure state between runs, or the cooldown will still block requests even after fixing the error message path.
- Partial model-fallback coverage: If only some providers in a routing chain enter billing cooldown, the
FallbackSummaryErrorcontains a mix ofreason: 'billing'and other reasons (e.g.,reason: 'timeout').isPureBillingSummary()returnsfalsein this mixed case. Consider addingisMostlyBillingSummary()as a secondary heuristic if mixed failures are common. - Docker volume mount timing: In Docker deployments, ensure the rebuilt container image is used (
docker build, not justdocker-compose up -d --buildif the build context is stale). A common mistake is editing source files and only runningup -d, which uses the existing image without the fix. - Log verbosity masking the issue: If
LOG_LEVEL=erroris set in production, the detailed[model-fallback] model fallback decisionlines may be suppressed, making it harder to diagnose whether the cooldown skip path or raw error path was taken. SetLOG_LEVEL=debugduring troubleshooting.
๐ Related Errors
FallbackSummaryErrorwith"Something went wrong"message โ The generic fallback error that users see when neitherisBillingErrorMessage()norisPureRateLimitSummary()matches. Indicates a classification gap in the error routing logic.ERROR_PATTERNS.billingpattern mismatch โ The set of regex patterns infailover-matches.tsthatisBillingErrorMessage()uses for string-based billing detection. Missing the cooldown skip phrase was the root cause here.- PR #61608 โ partial billing pattern fix โ Added
"out of extra usage"toERROR_PATTERNS.billingfor raw API errors only; did not address the cooldown-generated skip path. - Issue #48526 โ Related billing error classification gap (possibly earlier instance of the same pattern-matching vs structural classification problem).
- Issue #64224 โ OAuth authentication billing quota exhaustion causing repeated errors (likely same root cause, different manifestation).
- Issue #64308 โ Model-fallback all-models-skipped scenario with generic error output.
- Issue #62375 โ Anthropic provider billing error handling edge case with OAuth personal usage quotas.
isPureTransientRateLimitSummary()โ The correct pattern that the billing path should mirror. Demonstrates the structuralattempt.reasoninspection approach that was already implemented for rate-limit errors.BILLING_ERROR_USER_MESSAGEโ The expected user-facing message that was not being shown:"โ ๏ธ API provider returned a billing error โ your API key has run out of credits or has an insufficient balance. Check your provider's billing dashboard and top up or switch to a different API key."auth profile failure state: reason=billing window=disabledโ The auth profile metadata indicating that a billing cooldown has been activated, preventing further attempts for a defined cooldown period.