April 20, 2026

[Google Gemini提供商429速率限制错误] - Google Gemini Provider: 429 Rate Limit Scopes to Entire Provider Instead of Specific Model

当单个 Google Gemini 模型触发速率限制（429）时，OpenClaw 网关会对整个 'google' 提供商应用退避策略，导致具有独立配额的其他无关模型也无法访问。

🔍 症状

主要表现

当某个特定的 Google Gemini 模型配额用尽时，所有后续对该 google 提供商下任意模型的请求都会因速率限制错误而失败，即使这些模型拥有独立的配额。

错误输出示例

直接 API 响应（来自 Google 的 429）：

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

OpenClaw 网关在退避策略启动后的响应：

{
  "error": {
    "type": "rate_limit_exceeded",
    "provider": "google",
    "message": "Provider 'google' is currently in cooldown due to rate limiting. Retry-After: 120s",
    "retry_after": 120
  }
}

行为症状

无模型隔离：从 gemini-3.1-pro-preview-customtools 切换到 gemini-3.0-pro-preview 无法恢复功能。
持续不可用：所有 google 提供商的请求在提供商级别的冷却期结束前都会失败。
无回退路径：在速率限制事件期间，同一提供商下的其他模型无法作为备选方案。
网关层拒绝：请求可能在到达 Google API 之前的 OpenClaw 网关层就被拒绝。

复现场景

# Step 1: Request to rate-limited model
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'
# Response: 429 from Google API

# Step 2: Immediate fallback to another model
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'
# Expected: Request proceeds to Google API
# Actual: 429 or backoff error from OpenClaw gateway

🧠 根因分析

架构分析

根本原因在于 OpenClaw 网关的重试/退避机制中提供商级别的速率限制追踪实现。

故障序列

请求 gemini-3.1-pro-preview-customtools：特定模型的部署收到来自 Google API 的 429 RESOURCE_EXHAUSTED 响应。
网关拦截 429：OpenClaw 的错误处理中间件捕获 429 响应。
提供商级别退避激活：网关不是在特定模型/部署上记录速率限制，而是在 google 提供商标识符上设置冷却计时器。
后续请求 gemini-3.0-pro-preview：网关检查 google 提供商是否处于冷却中。发现处于冷却状态后，会预先用退避错误拒绝请求。
具有独立配额的其他模型被阻止：gemini-3.0-pro-preview 可能拥有完全独立的配额，但无法访问。

代码级根本原因

速率限制追踪可能使用如下数据结构：

// Simplified representation of current behavior
const providerBackoff = {
  "google": {
    cooldownUntil: 1699999999999,  // Unix timestamp
    reason: "rate_limit",
    retryAfter: 120
  }
};

// Backoff check
function shouldReject(provider) {
  return providerBackoff[provider]?.cooldownUntil > Date.now();
}

问题所在：退避机制使用提供商名称（“google”）作为键，而不是使用模型或部署标识符。

Google Gemini API 配额架构

Google Gemini API 的运作方式：

模型特定配额：每个模型（如 gemini-3.1-pro-preview-customtools）都有独立的速率限制。
项目级别配额：影响所有模型的更广泛限制，但通常要高得多。
区域端点：可能有独立的限制。

代码路径差异

场景	当前行为	预期行为
模型 A 触发 429	所有 `google` 提供商被阻止	仅模型 A 被阻止
模型 A 配额耗尽	模型 B 不可用	模型 B 在配额可用时继续工作
提供商退避激活	网关在第 7 层拒绝请求	请求发送到 API

🛠️ 逐步修复

选项 1：启用模型范围的速率限制（推荐）

如果 OpenClaw 支持按模型速率限制追踪，请将网关配置为使用模型级别退避：

之前（openclaw.yaml）：

providers:
  google:
    api_key: "${GOOGLE_API_KEY}"
    rate_limit:
      strategy: "provider"  # Current: blocks entire provider
      retry_after: 120

之后：

providers:
  google:
    api_key: "${GOOGLE_API_KEY}"
    rate_limit:
      strategy: "model"  # Changed: per-model tracking
      retry_after: 120
      scope: "deployment"  # Granularity: model/deployment level

选项 2：配置模型特定回退

定义明确的回退链以绕过速率受限的模型：

之前：

models:
  - name: "gemini-3.1-pro-preview-customtools"
    provider: "google"

之后：

models:
  - name: "gemini-3.1-pro-preview-customtools"
    provider: "google"
    fallback_models:
      - "gemini-3.0-pro-preview"
      - "gemini-pro"

  - name: "gemini-3.0-pro-preview"
    provider: "google"
    fallback_models:
      - "gemini-pro"

选项 3：提高提供商冷却粒度（代码修复）

如果您可以访问 OpenClaw 源代码，请修改速率限制追踪：

步骤 1：找到速率限制处理器

定位处理 429 响应的文件。通常位于：

src/gateway/middleware/rate-limit-handler.ts
src/providers/google/error-handler.ts

步骤 2：将退避键从提供商更改为模型

// BEFORE (provider-level)
providerBackoff[provider] = {
  cooldownUntil: Date.now() + retryAfter * 1000,
  reason: "rate_limit"
};

// AFTER (model-level)
const modelKey = `${provider}:${model}`;
modelBackoff[modelKey] = {
  cooldownUntil: Date.now() + retryAfter * 1000,
  reason: "rate_limit",
  model: model,
  provider: provider
};

步骤 3：更新拒绝检查

// BEFORE
function shouldReject(request) {
  const provider = request.provider;
  return providerBackoff[provider]?.cooldownUntil > Date.now();
}

// AFTER
function shouldReject(request) {
  const modelKey = `${request.provider}:${request.model}`;
  const providerKey = request.provider;
  
  // Check model-specific backoff first
  if (modelBackoff[modelKey]?.cooldownUntil > Date.now()) {
    return { rejected: true, reason: "model_rate_limited" };
  }
  
  // Fallback to provider-level for shared limits only
  if (providerBackoff[providerKey]?.cooldownUntil > Date.now()) {
    return { rejected: true, reason: "provider_rate_limited" };
  }
  
  return { rejected: false };
}

选项 4：通过多个提供商实例变通解决

为具有独立配额的模型创建单独的提供商配置：

providers:
  google-gemini-31:
    api_key: "${GOOGLE_API_KEY}"
    models:
      - "gemini-3.1-pro-preview-customtools"
    rate_limit:
      retry_after: 60

  google-gemini-30:
    api_key: "${GOOGLE_API_KEY}"
    models:
      - "gemini-3.0-pro-preview"
    rate_limit:
      retry_after: 60

  google-gemini-pro:
    api_key: "${GOOGLE_API_KEY}"
    models:
      - "gemini-pro"
    rate_limit:
      retry_after: 60

🧪 验证

测试 1：确认修复后的模型级隔离

# Step 1: Trigger rate limit on model A
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'

# Expected: 429 from Google API
# Verify with: echo $? (should be non-zero)

# Step 2: Immediately test model B access
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'

# Expected: 200 OK or valid API response (not gateway backoff error)

测试 2：验证模型特定退避状态

检查网关的内部状态（如果通过管理员端点暴露）：

GET /admin/rate-limit-status

# Expected response structure:
{
  "providers": {
    "google": {
      "cooldown": false,
      "models": {
        "gemini-3.1-pro-preview-customtools": {
          "cooldown": true,
          "retry_after": 120,
          "expires_at": "2024-01-15T10:30:00Z"
        },
        "gemini-3.0-pro-preview": {
          "cooldown": false
        }
      }
    }
  }
}

测试 3：并发模型可用性测试

# Run concurrent requests to different models
for model in "gemini-3.1-pro-preview-customtools" "gemini-3.0-pro-preview" "gemini-pro"; do
  echo "Testing: $model"
  curl -s -o /dev/null -w "%{http_code}\n" \
    -X POST https://api.openclaw.io/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"$model\", \"messages\": [{\"role\": \"user\", \"content\": \"test\"}]}"
done

# Expected: 
# gemini-3.1-pro-preview-customtools: 429 (rate limited)
# gemini-3.0-pro-preview: 200 (independent quota)
# gemini-pro: 200 (independent quota)

测试 4：退避过期验证

# Wait for cooldown to expire
echo "Waiting for model cooldown expiration..."
sleep 130  # retry_after + buffer

# Verify previously rate-limited model is accessible
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'

# Expected: 200 OK

成功标准

✅ 在 gemini-3.1-pro-preview-customtools 触发速率限制后，其他 google 模型仍可访问。
✅ 模型特定退避状态被正确追踪并独立过期。
✅ 网关不会预先拒绝对未受速率限制模型的请求。
✅ 当主模型不可用时，回退链正确工作。

⚠️ 常见陷阱

环境特定陷阱

Docker 容器缓存

# Pitfall: Container filesystem may cache rate limit state
# Restarting containers may not reset state if persistence is enabled

docker-compose down
docker volume prune openclaw-cache  # Clear cached state
docker-compose up -d

Kubernetes 卷挂载

如果使用持久卷进行速率限制追踪：

# Verify PVC is not stale after config changes
kubectl get pvc | grep openclaw
kubectl describe pvc openclaw-cache

# May need to delete and recreate if schema changed
kubectl delete pvc openclaw-cache
# Then restart deployments

macOS 开发环境

# Pitfall: Local rate limit state may persist across terminal sessions
# Clear any local state files
rm -rf ~/.openclaw/cache/*
rm -rf .openclaw/state.json

配置错误

回退链中的提供商名称不正确

# WRONG: Typos in provider name cause silent failures
models:
  - name: "gemini-3.0-pro-preview"
    provider: "googel"  # Typo - will not match actual provider

# CORRECT:
models:
  - name: "gemini-3.0-pro-preview"
    provider: "google"

重复的模型声明

# WRONG: Same model declared multiple times
models:
  - name: "gemini-3.0-pro-preview"
    provider: "google"
  - name: "gemini-3.0-pro-preview"  # Duplicate
    provider: "google"
    fallback_models: [...]

API 密钥范围不匹配

# Pitfall: Google API keys may have different quotas per project
# If using separate provider instances, ensure they use keys with adequate quotas

# Verify in Google Cloud Console:
# APIs & Services > Enabled APIs > Vertex AI API > Quotas

测试边缘情况

最后一个可用模型触发速率限制

# Scenario: All models under a provider are rate-limited
# Expected: Should return clear error, not silent success

# Verify error response includes all affected models
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'

# Check response contains actionable information
# Should NOT be empty 200 OK

快速模型切换

# Pitfall: Race condition during rapid switching may bypass backoff
# Test with concurrent requests

ab -n 100 -c 10 -T 'application/json' \
  -p request.json \
  https://api.openclaw.io/v1/chat/completions

# Verify all requests are properly rate-limited or processed

🔗 相关错误

错误代码	描述	关联性
`429 RESOURCE_EXHAUSTED`	Google API 返回速率限制错误	触发提供商退避的源错误
`503 Service Unavailable`	提供商暂时不可用	长期提供商退避的下游结果
`500 Internal Server Error`	网关在退避处理期间出错	速率限制中间件中的未处理异常
`ENOTFOUND`	Google API 的 DNS 解析失败	无关但可能误诊为速率限制
`ETIMEDOUT`	连接到 Google API 超时	无关但可能触发错误的退避逻辑
`INVALID_ARGUMENT`	向 Gemini API 发送的格式错误请求	在错误处理中可能被错误路由为速率限制

历史背景

此问题涉及多租户 API 网关设计中的更广泛模式：

过于宽泛的断路器：在提供商级别应用断路器模式，而断路器应该在模型/部署级别运行。
共享状态冲突：多个独立资源共享单一的速率限制计数器。
错误上下文不足：Google 的 429 响应包含 retryInfo，指定了哪个配额已耗尽，但可能未被解析。