April 17, 2026 • 版本: 2026.4.9

账单冷却跳过路径绕过 isBillingErrorMessage() — 显示通用错误而非账单消息

当所有模型回退候选项因账单冷却而被跳过时，用户会看到“出了点问题”这个通用错误，而不是可操作的 BILLING_ERROR_USER_MESSAGE，因为由冷却生成的跳过消息与 isBillingErrorMessage() 中的任何模式都不匹配。

🔍 症状

用户可感知症状

在使用 Anthropic OAuth 认证账户的首次计费失败后，所有后续重试尝试都会显示一条通用的、无法操作的错误消息：

⚠️ Something went wrong while processing your request. Please try again, or use /new to start a fresh session.

该错误以约30分钟的间隔重复出现，持续数小时，即使根本原因是 Anthropic 端的计费配额耗尽。

开发者可见症状（Agent 日志）

首次失败正确地暴露了计费错误：

[agent] embedded run agent end: runId=e8520f5d-... isError=true model=claude-opus-4-6 provider=anthropic error=LLM request rejected: You're out of extra usage. Add more at claude.ai/settings/usage and keep going.
[agent] auth profile failure state updated: runId=e8520f5d-... profile=sha256:154a23a3efe6 provider=anthropic reason=billing window=disabled

所有后续失败都会进入冷却跳过路径：

[model-fallback] model fallback decision: decision=skip_candidate requested=anthropic/claude-opus-4-6 candidate=anthropic/claude-opus-4-6 reason=billing next=anthropic/claude-sonnet-4-6 detail=Provider anthropic has billing issue (skipping all models)
[model-fallback] model fallback decision: decision=skip_candidate requested=anthropic/claude-opus-4-6 candidate=anthropic/claude-sonnet-4-6 reason=billing next=none detail=Provider anthropic has billing issue (skipping all models)
Embedded agent failed before reply: All models failed (2): anthropic/claude-opus-4-6: Provider anthropic has billing issue (skipping all models) (billing) | anthropic/claude-sonnet-4-6: Provider anthropic has billing issue (skipping all models) (billing)

结构化数据确认

FallbackSummaryError 在每次尝试中都带有 attempt.reason=“billing”，但 agent-runner-execution.ts 中的 isBillingErrorMessage() 检查使用字符串匹配方式对照 failover-matches.ts 中的 ERROR_PATTERNS.billing，该模式中不包含 “has billing issue” 这个模式。

频率模式

该循环每30分钟重复一次，持续较长时间：

2026-04-13T22:41:05 ... Embedded agent failed before reply: All models failed (2): ... (billing) | ... (billing)
2026-04-13T23:11:05 ... Embedded agent failed before reply: All models failed (2): ... (billing) | ... (billing)
2026-04-13T23:41:05 ... Embedded agent failed before reply: All models failed (2): ... (billing) | ... (billing)

🧠 根因分析

架构：两种错误分类策略

OpenClaw 使用两种不同的策略来将错误分类为计费相关，但在原始 API 错误路径和冷却跳过路径之间存在不对称：

原始 API 错误路径： isBillingErrorMessage(message: string) — 在 failover-matches.ts 中对 ERROR_PATTERNS.billing 进行正则表达式/字符串匹配。
速率限制路径（已正确实现）： isPureTransientRateLimitSummary(failure: FallbackSummaryError) — 对 attempt.reason === 'rate_limit' 进行结构化检查。
计费冷却跳过路径（存在问题）： 没有等效的结构化检查；仅依赖 isBillingErrorMessage() 字符串匹配，无法匹配 "has billing issue (skipping all models)"。

故障序列

用户的 Anthropic OAuth 个人"额外使用"配额已耗尽。
第一个 LLM 请求收到原始 API 错误：400 {"type":"error","error":{"type":"invalid_request_error","message":"You're out of extra usage. Add more at claude.ai/settings/usage and keep going."}}
原始错误消息匹配 ERROR_PATTERNS.billing → auth profile 进入 billing cooldown window=disabled。
后续请求触发 model-fallback.ts 中的模型回退跳过逻辑。
每个候选模型都使用详细信息跳过："Provider anthropic has billing issue (skipping all models)"。
为每个失败尝试构建一个 FallbackSummaryError，其中 attempt.reason="billing"。
在 agent-runner-execution.ts 中，代码调用 isBillingErrorMessage(error.message) — 但 "has billing issue" 不在 ERROR_PATTERNS.billing 中。
计费检查失败，因此采用通用回退错误路径，产生 "Something went wrong"。

为什么速率限制路径能正确工作

速率限制路径已经使用结构化的 attempt.reason 字段：

export function isPureTransientRateLimitSummary(failure: FallbackSummaryError): boolean {
  return failure.attempts.every(a => a.reason === 'rate_limit');
}

这种方法对消息字符串变化免疫，因为它检查的是语义分类字段，而不是人类可读的消息。

为什么 OAuth 会加剧问题

claude.ai 上的个人"额外使用"配额通常比组织 API 预算要小。OAuth 认证账户比 API 密钥认证的组织账户更频繁地耗尽其个人配额，使得这个 bug 成为 OAuth 安装路径的常见用户体验问题。

🛠️ 逐步修复

修复策略

在 failover-matches.ts 中添加一个结构化计费检查函数 isPureBillingSummary()，遵循现有的 isPureTransientRateLimitSummary() 模式。更新 agent-runner-execution.ts 以使用此结构化检查作为计费分类的主要门控，仅在遗留原始 API 错误的情况下回退到字符串匹配。

步骤 1：在 failover-matches.ts 中添加结构化计费检查

在与现有的 isPureTransientRateLimitSummary() 相邻的 src/core/failover/failover-matches.ts 中添加以下函数：

修改前：

export function isPureTransientRateLimitSummary(failure: FallbackSummaryError): boolean {
  return failure.attempts.every(a => a.reason === 'rate_limit');
}

修改后：

export function isPureTransientRateLimitSummary(failure: FallbackSummaryError): boolean {
  return failure.attempts.every(a => a.reason === 'rate_limit');
}

/**
 * Structural check: true when every attempt in the FallbackSummaryError
 * is classified as billing-cooldown. This correctly handles the
 * "Provider X has billing issue (skipping all models)" skip path,
 * which is not matched by isBillingErrorMessage() string patterns.
 */
export function isPureBillingSummary(failure: FallbackSummaryError): boolean {
  return failure.attempts.every(a => a.reason === 'billing');
}

步骤 2：更新 agent-runner-execution.ts 计费分类门控

在 src/core/agent-runner-execution.ts 中找到计费错误分类逻辑。将仅字符串的检查替换为结构优先的方法：

修改前：

const isBilling = isBillingErrorMessage(message);

修改后：

// Prefer structural classification (cooldown skip path) over string matching.
const isBilling = error instanceof FallbackSummaryError
  ? isPureBillingSummary(error)
  : isBillingErrorMessage(message);

确保已导入 FallbackSummaryError 和 isPureBillingSummary：

import { FallbackSummaryError } from '../model-fallback/types';
import { isPureBillingSummary } from '../failover/failover-matches';

步骤 3：（可选增强）扩展 ERROR_PATTERNS.billing

为确保通过冷却窗口的原始 API 错误也能被妥善处理，请在 src/core/failover/failover-matches.ts 中扩展计费模式以包含冷却跳过短语：

修改前：

export const ERROR_PATTERNS = {
  billing: [
    /out of extra usage/i,
    /insufficient balance/i,
    /billing error/i,
    /api key (has|runs out).*credit/i,
    /add more at.*usage/i,
    /out of credits/i,
  ],
  // ...
};

修改后：

export const ERROR_PATTERNS = {
  billing: [
    /out of extra usage/i,
    /insufficient balance/i,
    /billing error/i,
    /api key (has|runs out).*credit/i,
    /add more at.*usage/i,
    /out of credits/i,
    /has billing issue \(skipping all models\)/i, // cooldown skip path
  ],
  // ...
};

第三步是防御性的；步骤 1-2 中的主要修复已经足够，因为 isPureBillingSummary() 在字符串匹配到达 FallbackSummaryError 情况之前就已短路。

步骤 4：重新构建和部署

npm run build
# or for Docker deployments:
docker build -t openclaw:fixed .

🧪 验证

单元测试：isPureBillingSummary()

在 src/core/failover/failover-matches.test.ts 中添加测试用例：

import { isPureBillingSummary } from './failover-matches';
import { FallbackSummaryError, FallbackAttempt } from '../model-fallback/types';

describe('isPureBillingSummary', () => {
  it('returns true when all attempts have reason=billing', () => {
    const attempts: FallbackAttempt[] = [
      {
        provider: 'anthropic',
        model: 'claude-opus-4-6',
        reason: 'billing',
        message: 'Provider anthropic has billing issue (skipping all models)',
        durationMs: 0,
        startTime: 0,
        endTime: 0,
      },
      {
        provider: 'anthropic',
        model: 'claude-sonnet-4-6',
        reason: 'billing',
        message: 'Provider anthropic has billing issue (skipping all models)',
        durationMs: 0,
        startTime: 0,
        endTime: 0,
      },
    ];
    const error = new FallbackSummaryError('All models failed', attempts);
    expect(isPureBillingSummary(error)).toBe(true);
  });

  it('returns false when attempts contain mixed reasons', () => {
    const attempts: FallbackAttempt[] = [
      { provider: 'anthropic', model: 'claude-opus-4-6', reason: 'billing', message: '', durationMs: 0, startTime: 0, endTime: 0 },
      { provider: 'anthropic', model: 'claude-sonnet-4-6', reason: 'rate_limit', message: '', durationMs: 0, startTime: 0, endTime: 0 },
    ];
    const error = new FallbackSummaryError('All models failed', attempts);
    expect(isPureBillingSummary(error)).toBe(false);
  });

  it('returns false when no attempt has reason=billing', () => {
    const attempts: FallbackAttempt[] = [
      { provider: 'anthropic', model: 'claude-opus-4-6', reason: 'rate_limit', message: '', durationMs: 0, startTime: 0, endTime: 0 },
    ];
    const error = new FallbackSummaryError('All models failed', attempts);
    expect(isPureBillingSummary(error)).toBe(false);
  });
});

运行测试套件：

npm test -- --testPathPattern="failover-matches"
# Expected: isPureBillingSummary tests pass

集成测试：计费冷却错误渲染

使用测试提供商或模拟的 AuthProfile 模拟计费冷却场景：

# Using the OpenClaw CLI test harness (if available):
openclaw test:integration --scenario=billing-cooldown --auth-type=oauth

# Expected output in user-facing message channel:
# "⚠️ API provider returned a billing error — your API key has run out of credits
#  or has an insufficient balance. Check your provider's billing dashboard and
#  top up or switch to a different API key."
# (i.e., BILLING_ERROR_USER_MESSAGE, not "Something went wrong")

手动验证：日志检查

触发计费冷却并检查 agent 日志以确认分类已更正：

# Trigger a billing exhaustion scenario, then observe subsequent failures:
grep -E "(billing|isBilling|Something went wrong)" /var/log/openclaw/agent.log

# Before fix — "Something went wrong" appears repeatedly:
# Embedded agent failed before reply: ... (Something went wrong)
# Embedded agent failed before reply: ... (Something went wrong)

# After fix — BILLING_ERROR_USER_MESSAGE appears:
# [agent] embedded run agent end: ... userMessage=⚠️ API provider returned a billing error...
# Embedded agent failed before reply: ... (billing)

退出码验证

# Verify graceful degradation with billing error exit code
openclaw run --prompt="Hello" --model=anthropic/claude-opus-4-6
echo "Exit code: $?"
# Expected: non-zero exit (indicating error state was properly surfaced), NOT a crash

⚠️ 常见陷阱

仅扩展 ERROR_PATTERNS 而不添加 isPureBillingSummary()： 将 "has billing issue" 添加到字符串模式中作为一种变通方法是脆弱的。如果模型回退消息格式在未来版本中发生变化（例如 "Provider X billing cooldown — skipping all models"），该模式将再次失效。结构化的 isPureBillingSummary() 方法对消息字符串变化具有弹性。
无条件应用 isPureBillingSummary()： 检查必须用 instanceof FallbackSummaryError 保护。对原始字符串或其他错误类型调用它会抛出 TypeError。对于非 FallbackSummaryError 类型，回退到 isBillingErrorMessage(message) 可以保持与原始 API 错误的向后兼容性。
OAuth 与 API 密钥测试不对称： 该 bug 在 OAuth 认证账户中更容易显现，因为个人"额外使用"配额较小。使用组织级 API 密钥进行测试可能无法复现问题，导致错误地认为修复有效。始终同时测试 OAuth 个人配额场景和 API 密钥耗尽场景。
冷却窗口状态持久化： 如果有持久化存储（Redis、SQLite）支持，计费冷却状态会在重启后保留。确保测试环境在运行之间重置 auth profile 失败状态，否则即使修复了错误消息路径，冷却仍会阻止请求。
部分模型回退覆盖： 如果路由链中只有部分提供商进入计费冷却，FallbackSummaryError 会包含混合的 reason: 'billing' 和其他原因（例如 reason: 'timeout'）。在这种情况下 isPureBillingSummary() 返回 false。如果混合失败很常见，请考虑添加 isMostlyBillingSummary() 作为辅助启发式方法。
Docker 卷挂载时机： 在 Docker 部署中，确保使用重新构建的容器镜像（docker build，而不仅仅是 docker-compose up -d --build，如果构建上下文已过期的话）。一个常见错误是编辑源文件后只运行 up -d，这会使用现有镜像而不包含修复。
日志详细度掩盖问题： 如果在生产环境中设置了 LOG_LEVEL=error，详细的 [model-fallback] model fallback decision 行可能被抑制，使诊断冷却跳过路径或原始错误路径变得更加困难。在故障排除期间设置 LOG_LEVEL=debug。

🔗 相关错误

带有 "Something went wrong" 消息的 FallbackSummaryError — 当 isBillingErrorMessage() 和 isPureRateLimitSummary() 都不匹配时，用户看到的通用回退错误。表示错误路由逻辑中的分类缺口。
ERROR_PATTERNS.billing 模式不匹配 — failover-matches.ts 中 isBillingErrorMessage() 用于基于字符串的计费检测的正则表达式模式集。缺少冷却跳过短语是此处根本原因。
PR #61608 — 部分计费模式修复 — 仅针对原始 API 错误将 "out of extra usage" 添加到 ERROR_PATTERNS.billing；未解决冷却生成的跳过路径。
Issue #48526 — 相关计费错误分类缺口（可能是相同模式匹配与结构化分类问题的早期实例）。
Issue #64224 — OAuth 认证计费配额耗尽导致重复错误（可能是相同根本原因，不同表现形式）。
Issue #64308 — 模型回退所有模型跳过场景，输出通用错误。
Issue #62375 — Anthropic 提供商计费错误处理边缘情况，涉及 OAuth 个人使用配额。
isPureTransientRateLimitSummary() — 计费路径应该遵循的正确模式。展示了已经为速率限制错误实现的结构化 attempt.reason 检查方法。
BILLING_ERROR_USER_MESSAGE — 预期的面向用户的消息，当时未显示："⚠️ API provider returned a billing error — your API key has run out of credits or has an insufficient balance. Check your provider's billing dashboard and top up or switch to a different API key."
auth profile failure state: reason=billing window=disabled — Auth profile 元数据，指示计费冷却已被激活，在定义的冷却期间阻止进一步尝试。

🔍 症状

用户可感知症状

开发者可见症状（Agent 日志）

结构化数据确认

频率模式

🧠 根因分析

架构：两种错误分类策略

故障序列

相关代码位置

为什么速率限制路径能正确工作

为什么 OAuth 会加剧问题

🛠️ 逐步修复

修复策略

步骤 1：在 failover-matches.ts 中添加结构化计费检查

步骤 2：更新 agent-runner-execution.ts 计费分类门控

步骤 3：（可选增强）扩展 ERROR_PATTERNS.billing

步骤 4：重新构建和部署

🧪 验证

单元测试：isPureBillingSummary()

集成测试：计费冷却错误渲染

手动验证：日志检查

退出码验证

⚠️ 常见陷阱

🔗 相关错误