[Google Gemini提供商429速率限制错误] - Google Gemini Provider: 429 Rate Limit Scopes to Entire Provider Instead of Specific Model
当单个 Google Gemini 模型触发速率限制(429)时,OpenClaw 网关会对整个 'google' 提供商应用退避策略,导致具有独立配额的其他无关模型也无法访问。
🔍 症状
主要表现
当某个特定的 Google Gemini 模型配额用尽时,所有后续对该 google 提供商下任意模型的请求都会因速率限制错误而失败,即使这些模型拥有独立的配额。
错误输出示例
直接 API 响应(来自 Google 的 429):
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
{
"error": {
"code": 429,
"message": "Resource has been exhausted (e.g. check quota).",
"status": "RESOURCE_EXHAUSTED"
}
}
OpenClaw 网关在退避策略启动后的响应:
{
"error": {
"type": "rate_limit_exceeded",
"provider": "google",
"message": "Provider 'google' is currently in cooldown due to rate limiting. Retry-After: 120s",
"retry_after": 120
}
}
行为症状
- 无模型隔离:从
gemini-3.1-pro-preview-customtools切换到gemini-3.0-pro-preview无法恢复功能。 - 持续不可用:所有
google提供商的请求在提供商级别的冷却期结束前都会失败。 - 无回退路径:在速率限制事件期间,同一提供商下的其他模型无法作为备选方案。
- 网关层拒绝:请求可能在到达 Google API 之前的 OpenClaw 网关层就被拒绝。
复现场景
# Step 1: Request to rate-limited model
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'
# Response: 429 from Google API
# Step 2: Immediate fallback to another model
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'
# Expected: Request proceeds to Google API
# Actual: 429 or backoff error from OpenClaw gateway
🧠 根因分析
架构分析
根本原因在于 OpenClaw 网关的重试/退避机制中提供商级别的速率限制追踪实现。
故障序列
- 请求
gemini-3.1-pro-preview-customtools:特定模型的部署收到来自 Google API 的429 RESOURCE_EXHAUSTED响应。 - 网关拦截 429:OpenClaw 的错误处理中间件捕获 429 响应。
- 提供商级别退避激活:网关不是在特定模型/部署上记录速率限制,而是在
google提供商标识符上设置冷却计时器。 - 后续请求
gemini-3.0-pro-preview:网关检查google提供商是否处于冷却中。发现处于冷却状态后,会预先用退避错误拒绝请求。 - 具有独立配额的其他模型被阻止:
gemini-3.0-pro-preview可能拥有完全独立的配额,但无法访问。
代码级根本原因
速率限制追踪可能使用如下数据结构:
// Simplified representation of current behavior
const providerBackoff = {
"google": {
cooldownUntil: 1699999999999, // Unix timestamp
reason: "rate_limit",
retryAfter: 120
}
};
// Backoff check
function shouldReject(provider) {
return providerBackoff[provider]?.cooldownUntil > Date.now();
}
问题所在:退避机制使用提供商名称(“google”)作为键,而不是使用模型或部署标识符。
Google Gemini API 配额架构
Google Gemini API 的运作方式:
- 模型特定配额:每个模型(如
gemini-3.1-pro-preview-customtools)都有独立的速率限制。 - 项目级别配额:影响所有模型的更广泛限制,但通常要高得多。
- 区域端点:可能有独立的限制。
代码路径差异
| 场景 | 当前行为 | 预期行为 |
|---|---|---|
| 模型 A 触发 429 | 所有 google 提供商被阻止 | 仅模型 A 被阻止 |
| 模型 A 配额耗尽 | 模型 B 不可用 | 模型 B 在配额可用时继续工作 |
| 提供商退避激活 | 网关在第 7 层拒绝请求 | 请求发送到 API |
🛠️ 逐步修复
选项 1:启用模型范围的速率限制(推荐)
如果 OpenClaw 支持按模型速率限制追踪,请将网关配置为使用模型级别退避:
之前(openclaw.yaml):
providers:
google:
api_key: "${GOOGLE_API_KEY}"
rate_limit:
strategy: "provider" # Current: blocks entire provider
retry_after: 120
之后:
providers:
google:
api_key: "${GOOGLE_API_KEY}"
rate_limit:
strategy: "model" # Changed: per-model tracking
retry_after: 120
scope: "deployment" # Granularity: model/deployment level
选项 2:配置模型特定回退
定义明确的回退链以绕过速率受限的模型:
之前:
models:
- name: "gemini-3.1-pro-preview-customtools"
provider: "google"
之后:
models:
- name: "gemini-3.1-pro-preview-customtools"
provider: "google"
fallback_models:
- "gemini-3.0-pro-preview"
- "gemini-pro"
- name: "gemini-3.0-pro-preview"
provider: "google"
fallback_models:
- "gemini-pro"
选项 3:提高提供商冷却粒度(代码修复)
如果您可以访问 OpenClaw 源代码,请修改速率限制追踪:
步骤 1:找到速率限制处理器
定位处理 429 响应的文件。通常位于:
src/gateway/middleware/rate-limit-handler.ts
src/providers/google/error-handler.ts
步骤 2:将退避键从提供商更改为模型
// BEFORE (provider-level)
providerBackoff[provider] = {
cooldownUntil: Date.now() + retryAfter * 1000,
reason: "rate_limit"
};
// AFTER (model-level)
const modelKey = `${provider}:${model}`;
modelBackoff[modelKey] = {
cooldownUntil: Date.now() + retryAfter * 1000,
reason: "rate_limit",
model: model,
provider: provider
};
步骤 3:更新拒绝检查
// BEFORE
function shouldReject(request) {
const provider = request.provider;
return providerBackoff[provider]?.cooldownUntil > Date.now();
}
// AFTER
function shouldReject(request) {
const modelKey = `${request.provider}:${request.model}`;
const providerKey = request.provider;
// Check model-specific backoff first
if (modelBackoff[modelKey]?.cooldownUntil > Date.now()) {
return { rejected: true, reason: "model_rate_limited" };
}
// Fallback to provider-level for shared limits only
if (providerBackoff[providerKey]?.cooldownUntil > Date.now()) {
return { rejected: true, reason: "provider_rate_limited" };
}
return { rejected: false };
}
选项 4:通过多个提供商实例变通解决
为具有独立配额的模型创建单独的提供商配置:
providers:
google-gemini-31:
api_key: "${GOOGLE_API_KEY}"
models:
- "gemini-3.1-pro-preview-customtools"
rate_limit:
retry_after: 60
google-gemini-30:
api_key: "${GOOGLE_API_KEY}"
models:
- "gemini-3.0-pro-preview"
rate_limit:
retry_after: 60
google-gemini-pro:
api_key: "${GOOGLE_API_KEY}"
models:
- "gemini-pro"
rate_limit:
retry_after: 60
🧪 验证
测试 1:确认修复后的模型级隔离
# Step 1: Trigger rate limit on model A
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'
# Expected: 429 from Google API
# Verify with: echo $? (should be non-zero)
# Step 2: Immediately test model B access
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'
# Expected: 200 OK or valid API response (not gateway backoff error)
测试 2:验证模型特定退避状态
检查网关的内部状态(如果通过管理员端点暴露):
GET /admin/rate-limit-status
# Expected response structure:
{
"providers": {
"google": {
"cooldown": false,
"models": {
"gemini-3.1-pro-preview-customtools": {
"cooldown": true,
"retry_after": 120,
"expires_at": "2024-01-15T10:30:00Z"
},
"gemini-3.0-pro-preview": {
"cooldown": false
}
}
}
}
}
测试 3:并发模型可用性测试
# Run concurrent requests to different models
for model in "gemini-3.1-pro-preview-customtools" "gemini-3.0-pro-preview" "gemini-pro"; do
echo "Testing: $model"
curl -s -o /dev/null -w "%{http_code}\n" \
-X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{\"model\": \"$model\", \"messages\": [{\"role\": \"user\", \"content\": \"test\"}]}"
done
# Expected:
# gemini-3.1-pro-preview-customtools: 429 (rate limited)
# gemini-3.0-pro-preview: 200 (independent quota)
# gemini-pro: 200 (independent quota)
测试 4:退避过期验证
# Wait for cooldown to expire
echo "Waiting for model cooldown expiration..."
sleep 130 # retry_after + buffer
# Verify previously rate-limited model is accessible
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'
# Expected: 200 OK
成功标准
- ✅ 在
gemini-3.1-pro-preview-customtools触发速率限制后,其他google模型仍可访问。 - ✅ 模型特定退避状态被正确追踪并独立过期。
- ✅ 网关不会预先拒绝对未受速率限制模型的请求。
- ✅ 当主模型不可用时,回退链正确工作。
⚠️ 常见陷阱
环境特定陷阱
Docker 容器缓存
# Pitfall: Container filesystem may cache rate limit state
# Restarting containers may not reset state if persistence is enabled
docker-compose down
docker volume prune openclaw-cache # Clear cached state
docker-compose up -d
Kubernetes 卷挂载
如果使用持久卷进行速率限制追踪:
# Verify PVC is not stale after config changes
kubectl get pvc | grep openclaw
kubectl describe pvc openclaw-cache
# May need to delete and recreate if schema changed
kubectl delete pvc openclaw-cache
# Then restart deployments
macOS 开发环境
# Pitfall: Local rate limit state may persist across terminal sessions
# Clear any local state files
rm -rf ~/.openclaw/cache/*
rm -rf .openclaw/state.json
配置错误
回退链中的提供商名称不正确
# WRONG: Typos in provider name cause silent failures
models:
- name: "gemini-3.0-pro-preview"
provider: "googel" # Typo - will not match actual provider
# CORRECT:
models:
- name: "gemini-3.0-pro-preview"
provider: "google"
重复的模型声明
# WRONG: Same model declared multiple times
models:
- name: "gemini-3.0-pro-preview"
provider: "google"
- name: "gemini-3.0-pro-preview" # Duplicate
provider: "google"
fallback_models: [...]
API 密钥范围不匹配
# Pitfall: Google API keys may have different quotas per project
# If using separate provider instances, ensure they use keys with adequate quotas
# Verify in Google Cloud Console:
# APIs & Services > Enabled APIs > Vertex AI API > Quotas
测试边缘情况
最后一个可用模型触发速率限制
# Scenario: All models under a provider are rate-limited
# Expected: Should return clear error, not silent success
# Verify error response includes all affected models
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'
# Check response contains actionable information
# Should NOT be empty 200 OK
快速模型切换
# Pitfall: Race condition during rapid switching may bypass backoff
# Test with concurrent requests
ab -n 100 -c 10 -T 'application/json' \
-p request.json \
https://api.openclaw.io/v1/chat/completions
# Verify all requests are properly rate-limited or processed
🔗 相关错误
| 错误代码 | 描述 | 关联性 |
|---|---|---|
429 RESOURCE_EXHAUSTED | Google API 返回速率限制错误 | 触发提供商退避的源错误 |
503 Service Unavailable | 提供商暂时不可用 | 长期提供商退避的下游结果 |
500 Internal Server Error | 网关在退避处理期间出错 | 速率限制中间件中的未处理异常 |
ENOTFOUND | Google API 的 DNS 解析失败 | 无关但可能误诊为速率限制 |
ETIMEDOUT | 连接到 Google API 超时 | 无关但可能触发错误的退避逻辑 |
INVALID_ARGUMENT | 向 Gemini API 发送的格式错误请求 | 在错误处理中可能被错误路由为速率限制 |
历史背景
此问题涉及多租户 API 网关设计中的更广泛模式:
- 过于宽泛的断路器:在提供商级别应用断路器模式,而断路器应该在模型/部署级别运行。
- 共享状态冲突:多个独立资源共享单一的速率限制计数器。
- 错误上下文不足:Google 的 429 响应包含
retryInfo,指定了哪个配额已耗尽,但可能未被解析。
相关 GitHub 问题
- 速率限制应按模型而非按提供商进行范围划分 - 模型级隔离的功能请求
- Google Gemini 提供商退避阻止所有模型 - 重复追踪问题
- 添加 Google 429 响应的 retry-after 解析 - 准确计算冷却时间的增强