April 20, 2026

[Google Gemini提供商429速率限制错误] - Google Gemini Provider: 429 Rate Limit Scopes to Entire Provider Instead of Specific Model

当单个 Google Gemini 模型触发速率限制(429)时,OpenClaw 网关会对整个 'google' 提供商应用退避策略,导致具有独立配额的其他无关模型也无法访问。

🔍 症状

主要表现

当某个特定的 Google Gemini 模型配额用尽时,所有后续对该 google 提供商下任意模型的请求都会因速率限制错误而失败,即使这些模型拥有独立的配额。

错误输出示例

直接 API 响应(来自 Google 的 429):

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

OpenClaw 网关在退避策略启动后的响应:

{
  "error": {
    "type": "rate_limit_exceeded",
    "provider": "google",
    "message": "Provider 'google' is currently in cooldown due to rate limiting. Retry-After: 120s",
    "retry_after": 120
  }
}

行为症状

  • 无模型隔离:从 gemini-3.1-pro-preview-customtools 切换到 gemini-3.0-pro-preview 无法恢复功能。
  • 持续不可用:所有 google 提供商的请求在提供商级别的冷却期结束前都会失败。
  • 无回退路径:在速率限制事件期间,同一提供商下的其他模型无法作为备选方案。
  • 网关层拒绝:请求可能在到达 Google API 之前的 OpenClaw 网关层就被拒绝。

复现场景

# Step 1: Request to rate-limited model
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'
# Response: 429 from Google API

# Step 2: Immediate fallback to another model
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'
# Expected: Request proceeds to Google API
# Actual: 429 or backoff error from OpenClaw gateway

🧠 根因分析

架构分析

根本原因在于 OpenClaw 网关的重试/退避机制中提供商级别的速率限制追踪实现。

故障序列

  1. 请求 gemini-3.1-pro-preview-customtools:特定模型的部署收到来自 Google API 的 429 RESOURCE_EXHAUSTED 响应。
  2. 网关拦截 429:OpenClaw 的错误处理中间件捕获 429 响应。
  3. 提供商级别退避激活:网关不是在特定模型/部署上记录速率限制,而是在 google 提供商标识符上设置冷却计时器。
  4. 后续请求 gemini-3.0-pro-preview:网关检查 google 提供商是否处于冷却中。发现处于冷却状态后,会预先用退避错误拒绝请求。
  5. 具有独立配额的其他模型被阻止gemini-3.0-pro-preview 可能拥有完全独立的配额,但无法访问。

代码级根本原因

速率限制追踪可能使用如下数据结构:

// Simplified representation of current behavior
const providerBackoff = {
  "google": {
    cooldownUntil: 1699999999999,  // Unix timestamp
    reason: "rate_limit",
    retryAfter: 120
  }
};

// Backoff check
function shouldReject(provider) {
  return providerBackoff[provider]?.cooldownUntil > Date.now();
}

问题所在:退避机制使用提供商名称(“google”)作为键,而不是使用模型或部署标识符。

Google Gemini API 配额架构

Google Gemini API 的运作方式:

  • 模型特定配额:每个模型(如 gemini-3.1-pro-preview-customtools)都有独立的速率限制。
  • 项目级别配额:影响所有模型的更广泛限制,但通常要高得多。
  • 区域端点:可能有独立的限制。

代码路径差异

场景当前行为预期行为
模型 A 触发 429所有 google 提供商被阻止仅模型 A 被阻止
模型 A 配额耗尽模型 B 不可用模型 B 在配额可用时继续工作
提供商退避激活网关在第 7 层拒绝请求请求发送到 API

🛠️ 逐步修复

选项 1:启用模型范围的速率限制(推荐)

如果 OpenClaw 支持按模型速率限制追踪,请将网关配置为使用模型级别退避:

之前(openclaw.yaml):

providers:
  google:
    api_key: "${GOOGLE_API_KEY}"
    rate_limit:
      strategy: "provider"  # Current: blocks entire provider
      retry_after: 120

之后:

providers:
  google:
    api_key: "${GOOGLE_API_KEY}"
    rate_limit:
      strategy: "model"  # Changed: per-model tracking
      retry_after: 120
      scope: "deployment"  # Granularity: model/deployment level

选项 2:配置模型特定回退

定义明确的回退链以绕过速率受限的模型:

之前:

models:
  - name: "gemini-3.1-pro-preview-customtools"
    provider: "google"

之后:

models:
  - name: "gemini-3.1-pro-preview-customtools"
    provider: "google"
    fallback_models:
      - "gemini-3.0-pro-preview"
      - "gemini-pro"

  - name: "gemini-3.0-pro-preview"
    provider: "google"
    fallback_models:
      - "gemini-pro"

选项 3:提高提供商冷却粒度(代码修复)

如果您可以访问 OpenClaw 源代码,请修改速率限制追踪:

步骤 1:找到速率限制处理器

定位处理 429 响应的文件。通常位于:

src/gateway/middleware/rate-limit-handler.ts
src/providers/google/error-handler.ts

步骤 2:将退避键从提供商更改为模型

// BEFORE (provider-level)
providerBackoff[provider] = {
  cooldownUntil: Date.now() + retryAfter * 1000,
  reason: "rate_limit"
};

// AFTER (model-level)
const modelKey = `${provider}:${model}`;
modelBackoff[modelKey] = {
  cooldownUntil: Date.now() + retryAfter * 1000,
  reason: "rate_limit",
  model: model,
  provider: provider
};

步骤 3:更新拒绝检查

// BEFORE
function shouldReject(request) {
  const provider = request.provider;
  return providerBackoff[provider]?.cooldownUntil > Date.now();
}

// AFTER
function shouldReject(request) {
  const modelKey = `${request.provider}:${request.model}`;
  const providerKey = request.provider;
  
  // Check model-specific backoff first
  if (modelBackoff[modelKey]?.cooldownUntil > Date.now()) {
    return { rejected: true, reason: "model_rate_limited" };
  }
  
  // Fallback to provider-level for shared limits only
  if (providerBackoff[providerKey]?.cooldownUntil > Date.now()) {
    return { rejected: true, reason: "provider_rate_limited" };
  }
  
  return { rejected: false };
}

选项 4:通过多个提供商实例变通解决

为具有独立配额的模型创建单独的提供商配置:

providers:
  google-gemini-31:
    api_key: "${GOOGLE_API_KEY}"
    models:
      - "gemini-3.1-pro-preview-customtools"
    rate_limit:
      retry_after: 60

  google-gemini-30:
    api_key: "${GOOGLE_API_KEY}"
    models:
      - "gemini-3.0-pro-preview"
    rate_limit:
      retry_after: 60

  google-gemini-pro:
    api_key: "${GOOGLE_API_KEY}"
    models:
      - "gemini-pro"
    rate_limit:
      retry_after: 60

🧪 验证

测试 1:确认修复后的模型级隔离

# Step 1: Trigger rate limit on model A
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'

# Expected: 429 from Google API
# Verify with: echo $? (should be non-zero)

# Step 2: Immediately test model B access
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'

# Expected: 200 OK or valid API response (not gateway backoff error)

测试 2:验证模型特定退避状态

检查网关的内部状态(如果通过管理员端点暴露):

GET /admin/rate-limit-status

# Expected response structure:
{
  "providers": {
    "google": {
      "cooldown": false,
      "models": {
        "gemini-3.1-pro-preview-customtools": {
          "cooldown": true,
          "retry_after": 120,
          "expires_at": "2024-01-15T10:30:00Z"
        },
        "gemini-3.0-pro-preview": {
          "cooldown": false
        }
      }
    }
  }
}

测试 3:并发模型可用性测试

# Run concurrent requests to different models
for model in "gemini-3.1-pro-preview-customtools" "gemini-3.0-pro-preview" "gemini-pro"; do
  echo "Testing: $model"
  curl -s -o /dev/null -w "%{http_code}\n" \
    -X POST https://api.openclaw.io/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"$model\", \"messages\": [{\"role\": \"user\", \"content\": \"test\"}]}"
done

# Expected: 
# gemini-3.1-pro-preview-customtools: 429 (rate limited)
# gemini-3.0-pro-preview: 200 (independent quota)
# gemini-pro: 200 (independent quota)

测试 4:退避过期验证

# Wait for cooldown to expire
echo "Waiting for model cooldown expiration..."
sleep 130  # retry_after + buffer

# Verify previously rate-limited model is accessible
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'

# Expected: 200 OK

成功标准

  • ✅ 在 gemini-3.1-pro-preview-customtools 触发速率限制后,其他 google 模型仍可访问。
  • ✅ 模型特定退避状态被正确追踪并独立过期。
  • ✅ 网关不会预先拒绝对未受速率限制模型的请求。
  • ✅ 当主模型不可用时,回退链正确工作。

⚠️ 常见陷阱

环境特定陷阱

Docker 容器缓存

# Pitfall: Container filesystem may cache rate limit state
# Restarting containers may not reset state if persistence is enabled

docker-compose down
docker volume prune openclaw-cache  # Clear cached state
docker-compose up -d

Kubernetes 卷挂载

如果使用持久卷进行速率限制追踪:

# Verify PVC is not stale after config changes
kubectl get pvc | grep openclaw
kubectl describe pvc openclaw-cache

# May need to delete and recreate if schema changed
kubectl delete pvc openclaw-cache
# Then restart deployments

macOS 开发环境

# Pitfall: Local rate limit state may persist across terminal sessions
# Clear any local state files
rm -rf ~/.openclaw/cache/*
rm -rf .openclaw/state.json

配置错误

回退链中的提供商名称不正确

# WRONG: Typos in provider name cause silent failures
models:
  - name: "gemini-3.0-pro-preview"
    provider: "googel"  # Typo - will not match actual provider

# CORRECT:
models:
  - name: "gemini-3.0-pro-preview"
    provider: "google"

重复的模型声明

# WRONG: Same model declared multiple times
models:
  - name: "gemini-3.0-pro-preview"
    provider: "google"
  - name: "gemini-3.0-pro-preview"  # Duplicate
    provider: "google"
    fallback_models: [...]

API 密钥范围不匹配

# Pitfall: Google API keys may have different quotas per project
# If using separate provider instances, ensure they use keys with adequate quotas

# Verify in Google Cloud Console:
# APIs & Services > Enabled APIs > Vertex AI API > Quotas

测试边缘情况

最后一个可用模型触发速率限制

# Scenario: All models under a provider are rate-limited
# Expected: Should return clear error, not silent success

# Verify error response includes all affected models
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'

# Check response contains actionable information
# Should NOT be empty 200 OK

快速模型切换

# Pitfall: Race condition during rapid switching may bypass backoff
# Test with concurrent requests

ab -n 100 -c 10 -T 'application/json' \
  -p request.json \
  https://api.openclaw.io/v1/chat/completions

# Verify all requests are properly rate-limited or processed

🔗 相关错误

错误代码描述关联性
429 RESOURCE_EXHAUSTEDGoogle API 返回速率限制错误触发提供商退避的源错误
503 Service Unavailable提供商暂时不可用长期提供商退避的下游结果
500 Internal Server Error网关在退避处理期间出错速率限制中间件中的未处理异常
ENOTFOUNDGoogle API 的 DNS 解析失败无关但可能误诊为速率限制
ETIMEDOUT连接到 Google API 超时无关但可能触发错误的退避逻辑
INVALID_ARGUMENT向 Gemini API 发送的格式错误请求在错误处理中可能被错误路由为速率限制

历史背景

此问题涉及多租户 API 网关设计中的更广泛模式:

  • 过于宽泛的断路器:在提供商级别应用断路器模式,而断路器应该在模型/部署级别运行。
  • 共享状态冲突:多个独立资源共享单一的速率限制计数器。
  • 错误上下文不足:Google 的 429 响应包含 retryInfo,指定了哪个配额已耗尽,但可能未被解析。

相关 GitHub 问题

依据与来源

本故障排除指南由 FixClaw 智能管线从社区讨论中自动合成。