Google Gemini Provider: 429 Rate Limit Scopes to Entire Provider Instead of Specific Model
When a single Google Gemini model hits rate limit (429), the OpenClaw gateway applies backoff to the entire 'google' provider, blocking access to other unrelated models with independent quotas.
๐ Symptoms
Primary Manifestation
When a specific Google Gemini model exhausts its quota, all subsequent requests to any model under the google provider fail with rate limit errors, even when those models have independent quota allocations.
Error Output Examples
Direct API Response (429 from Google):
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
{
"error": {
"code": 429,
"message": "Resource has been exhausted (e.g. check quota).",
"status": "RESOURCE_EXHAUSTED"
}
}
OpenClaw Gateway Response After Backoff Engages:
{
"error": {
"type": "rate_limit_exceeded",
"provider": "google",
"message": "Provider 'google' is currently in cooldown due to rate limiting. Retry-After: 120s",
"retry_after": 120
}
}
Behavioral Symptoms
- No Model Isolation: Switching from
gemini-3.1-pro-preview-customtoolstogemini-3.0-pro-previewdoes not restore functionality. - Extended Unavailability: All
googleprovider requests fail until the provider-level cooldown expires. - No Fallback Path: Alternative models under the same provider cannot serve as fallbacks during rate limit events.
- Gateway-Level Rejection: Requests may be rejected at the OpenClaw gateway layer before reaching Google's API.
Reproduction Scenario
# Step 1: Request to rate-limited model
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'
# Response: 429 from Google API
# Step 2: Immediate fallback to another model
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'
# Expected: Request proceeds to Google API
# Actual: 429 or backoff error from OpenClaw gateway
๐ง Root Cause
Architectural Analysis
The root cause lies in the provider-level rate limit tracking implementation within the OpenClaw gateway’s retry/backoff mechanism.
Failure Sequence
- Request to
gemini-3.1-pro-preview-customtools: The model-specific deployment receives a429 RESOURCE_EXHAUSTEDfrom Google's API. - Gateway Intercepts 429: OpenClaw's error handling middleware catches the 429 response.
- Provider-Level Backoff Activation: Instead of recording the rate limit against the specific model/deployment, the gateway sets a cooldown timer on the
googleprovider identifier. - Subsequent Request to
gemini-3.0-pro-preview: The gateway checks if thegoogleprovider is in cooldown. Finding it is, it rejects the request preemptively with a backoff error. - Model with Independent Quota is Blocked:
gemini-3.0-pro-previewmay have a completely separate quota allocation, but cannot be accessed.
Code-Level Root Cause
The rate limit tracking likely uses a data structure similar to:
// Simplified representation of current behavior
const providerBackoff = {
"google": {
cooldownUntil: 1699999999999, // Unix timestamp
reason: "rate_limit",
retryAfter: 120
}
};
// Backoff check
function shouldReject(provider) {
return providerBackoff[provider]?.cooldownUntil > Date.now();
}
The problem: The backoff is keyed by provider name (“google”) rather than by model or deployment identifier.
Google Gemini API Quota Architecture
Google Gemini API operates with:
- Model-specific quotas: Each model (e.g.,
gemini-3.1-pro-preview-customtools) has independent rate limits. - Project-level quotas: Broader limits that affect all models, but these are typically much higher.
- Regional endpoints: May have independent limits.
Divergent Code Paths
| Scenario | Current Behavior | Expected Behavior |
|---|---|---|
| Model A hits 429 | All google provider blocked | Only Model A blocked |
| Model A quota exhausted | Model B unusable | Model B continues if quota available |
| Provider backoff active | Gateway rejects at layer 7 | Request proceeds to API |
๐ ๏ธ Step-by-Step Fix
Option 1: Enable Model-Scoped Rate Limiting (Recommended)
If OpenClaw supports per-model rate limit tracking, configure the gateway to use model-level backoff:
Before (openclaw.yaml):
providers:
google:
api_key: "${GOOGLE_API_KEY}"
rate_limit:
strategy: "provider" # Current: blocks entire provider
retry_after: 120
After:
providers:
google:
api_key: "${GOOGLE_API_KEY}"
rate_limit:
strategy: "model" # Changed: per-model tracking
retry_after: 120
scope: "deployment" # Granularity: model/deployment level
Option 2: Configure Model-Specific Fallbacks
Define explicit fallback chains to bypass rate-limited models:
Before:
models:
- name: "gemini-3.1-pro-preview-customtools"
provider: "google"
After:
models:
- name: "gemini-3.1-pro-preview-customtools"
provider: "google"
fallback_models:
- "gemini-3.0-pro-preview"
- "gemini-pro"
- name: "gemini-3.0-pro-preview"
provider: "google"
fallback_models:
- "gemini-pro"
Option 3: Increase Provider Cooldown Granularity (Code Fix)
If you have access to the OpenClaw source code, modify the rate limit tracking:
Step 1: Identify the rate limit handler
Locate the file handling 429 responses. Typically found at:
src/gateway/middleware/rate-limit-handler.ts
src/providers/google/error-handler.ts
Step 2: Modify backoff key from provider to model
// BEFORE (provider-level)
providerBackoff[provider] = {
cooldownUntil: Date.now() + retryAfter * 1000,
reason: "rate_limit"
};
// AFTER (model-level)
const modelKey = `${provider}:${model}`;
modelBackoff[modelKey] = {
cooldownUntil: Date.now() + retryAfter * 1000,
reason: "rate_limit",
model: model,
provider: provider
};
Step 3: Update the rejection check
// BEFORE
function shouldReject(request) {
const provider = request.provider;
return providerBackoff[provider]?.cooldownUntil > Date.now();
}
// AFTER
function shouldReject(request) {
const modelKey = `${request.provider}:${request.model}`;
const providerKey = request.provider;
// Check model-specific backoff first
if (modelBackoff[modelKey]?.cooldownUntil > Date.now()) {
return { rejected: true, reason: "model_rate_limited" };
}
// Fallback to provider-level for shared limits only
if (providerBackoff[providerKey]?.cooldownUntil > Date.now()) {
return { rejected: true, reason: "provider_rate_limited" };
}
return { rejected: false };
}
Option 4: Workaround via Multiple Provider Instances
Create separate provider configurations for models with independent quotas:
providers:
google-gemini-31:
api_key: "${GOOGLE_API_KEY}"
models:
- "gemini-3.1-pro-preview-customtools"
rate_limit:
retry_after: 60
google-gemini-30:
api_key: "${GOOGLE_API_KEY}"
models:
- "gemini-3.0-pro-preview"
rate_limit:
retry_after: 60
google-gemini-pro:
api_key: "${GOOGLE_API_KEY}"
models:
- "gemini-pro"
rate_limit:
retry_after: 60
๐งช Verification
Test 1: Confirm Model-Level Isolation After Fix
# Step 1: Trigger rate limit on model A
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'
# Expected: 429 from Google API
# Verify with: echo $? (should be non-zero)
# Step 2: Immediately test model B access
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'
# Expected: 200 OK or valid API response (not gateway backoff error)
Test 2: Verify Model-Specific Backoff State
Check the gateway’s internal state (if exposed via admin endpoint):
GET /admin/rate-limit-status
# Expected response structure:
{
"providers": {
"google": {
"cooldown": false,
"models": {
"gemini-3.1-pro-preview-customtools": {
"cooldown": true,
"retry_after": 120,
"expires_at": "2024-01-15T10:30:00Z"
},
"gemini-3.0-pro-preview": {
"cooldown": false
}
}
}
}
}
Test 3: Concurrent Model Availability Test
# Run concurrent requests to different models
for model in "gemini-3.1-pro-preview-customtools" "gemini-3.0-pro-preview" "gemini-pro"; do
echo "Testing: $model"
curl -s -o /dev/null -w "%{http_code}\n" \
-X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{\"model\": \"$model\", \"messages\": [{\"role\": \"user\", \"content\": \"test\"}]}"
done
# Expected:
# gemini-3.1-pro-preview-customtools: 429 (rate limited)
# gemini-3.0-pro-preview: 200 (independent quota)
# gemini-pro: 200 (independent quota)
Test 4: Backoff Expiration Verification
# Wait for cooldown to expire
echo "Waiting for model cooldown expiration..."
sleep 130 # retry_after + buffer
# Verify previously rate-limited model is accessible
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'
# Expected: 200 OK
Success Criteria
- โ
After rate limit on
gemini-3.1-pro-preview-customtools, othergooglemodels remain accessible. - โ Model-specific backoff state is correctly tracked and expires independently.
- โ Gateway does not preemptively reject requests to non-rate-limited models.
- โ Fallback chains function correctly when primary model is unavailable.
โ ๏ธ Common Pitfalls
Environment-Specific Traps
Docker Container Caching
# Pitfall: Container filesystem may cache rate limit state
# Restarting containers may not reset state if persistence is enabled
docker-compose down
docker volume prune openclaw-cache # Clear cached state
docker-compose up -d
Kubernetes Volume Mounts
If using persistent volumes for rate limit tracking:
# Verify PVC is not stale after config changes
kubectl get pvc | grep openclaw
kubectl describe pvc openclaw-cache
# May need to delete and recreate if schema changed
kubectl delete pvc openclaw-cache
# Then restart deployments
macOS Development Environment
# Pitfall: Local rate limit state may persist across terminal sessions
# Clear any local state files
rm -rf ~/.openclaw/cache/*
rm -rf .openclaw/state.json
Configuration Mistakes
Incorrect Provider Name in Fallback Chain
# WRONG: Typos in provider name cause silent failures
models:
- name: "gemini-3.0-pro-preview"
provider: "googel" # Typo - will not match actual provider
# CORRECT:
models:
- name: "gemini-3.0-pro-preview"
provider: "google"
Overlapping Model Declarations
# WRONG: Same model declared multiple times
models:
- name: "gemini-3.0-pro-preview"
provider: "google"
- name: "gemini-3.0-pro-preview" # Duplicate
provider: "google"
fallback_models: [...]
API Key Scope Mismatch
# Pitfall: Google API keys may have different quotas per project
# If using separate provider instances, ensure they use keys with adequate quotas
# Verify in Google Cloud Console:
# APIs & Services > Enabled APIs > Vertex AI API > Quotas
Testing Edge Cases
Rate Limit on Last Available Model
# Scenario: All models under a provider are rate-limited
# Expected: Should return clear error, not silent success
# Verify error response includes all affected models
curl -X POST https://api.openclaw.io/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'
# Check response contains actionable information
# Should NOT be empty 200 OK
Rapid Model Switching
# Pitfall: Race condition during rapid switching may bypass backoff
# Test with concurrent requests
ab -n 100 -c 10 -T 'application/json' \
-p request.json \
https://api.openclaw.io/v1/chat/completions
# Verify all requests are properly rate-limited or processed
๐ Related Errors
| Error Code | Description | Connection |
|---|---|---|
429 RESOURCE_EXHAUSTED | Google API returned rate limit error | Source error triggering provider backoff |
503 Service Unavailable | Provider temporarily unavailable | Downstream of prolonged provider backoff |
500 Internal Server Error | Gateway error during backoff handling | Unhandled exception in rate limit middleware |
ENOTFOUND | DNS resolution failure for Google API | Unrelated but may be misdiagnosed as rate limit |
ETIMEDOUT | Connection timeout to Google API | Unrelated but may trigger incorrect backoff logic |
INVALID_ARGUMENT | Malformed request to Gemini API | May be misrouted as rate limit in error handling |
Historical Context
This issue relates to broader patterns in multi-tenant API gateway design:
- Overly Broad Circuit Breakers: Applying circuit breaker patterns at provider level when they should operate at model/deployment level.
- Shared State Collision: Multiple independent resources sharing a single rate limit counter.
- Insufficient Error Context: 429 responses from Google include
retryInfothat specifies which quota was exhausted, but this may not be parsed.
Related GitHub Issues
- Rate limiting should be scoped per-model not per-provider - Feature request for model-level isolation
- Google Gemini provider backoff blocks all models - Duplicate tracking issue
- Add retry-after parsing from Google 429 responses - Enhancement for accurate cooldown calculation