April 21, 2026

Google Gemini Provider: 429 Rate Limit Scopes to Entire Provider Instead of Specific Model

When a single Google Gemini model hits rate limit (429), the OpenClaw gateway applies backoff to the entire 'google' provider, blocking access to other unrelated models with independent quotas.

🔍 Symptoms

Primary Manifestation

When a specific Google Gemini model exhausts its quota, all subsequent requests to any model under the google provider fail with rate limit errors, even when those models have independent quota allocations.

Error Output Examples

Direct API Response (429 from Google):

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

OpenClaw Gateway Response After Backoff Engages:

{
  "error": {
    "type": "rate_limit_exceeded",
    "provider": "google",
    "message": "Provider 'google' is currently in cooldown due to rate limiting. Retry-After: 120s",
    "retry_after": 120
  }
}

Behavioral Symptoms

No Model Isolation: Switching from gemini-3.1-pro-preview-customtools to gemini-3.0-pro-preview does not restore functionality.
Extended Unavailability: All google provider requests fail until the provider-level cooldown expires.
No Fallback Path: Alternative models under the same provider cannot serve as fallbacks during rate limit events.
Gateway-Level Rejection: Requests may be rejected at the OpenClaw gateway layer before reaching Google's API.

Reproduction Scenario

# Step 1: Request to rate-limited model
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'
# Response: 429 from Google API

# Step 2: Immediate fallback to another model
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'
# Expected: Request proceeds to Google API
# Actual: 429 or backoff error from OpenClaw gateway

🧠 Root Cause

Architectural Analysis

The root cause lies in the provider-level rate limit tracking implementation within the OpenClaw gateway’s retry/backoff mechanism.

Failure Sequence

Request to gemini-3.1-pro-preview-customtools: The model-specific deployment receives a 429 RESOURCE_EXHAUSTED from Google's API.
Gateway Intercepts 429: OpenClaw's error handling middleware catches the 429 response.
Provider-Level Backoff Activation: Instead of recording the rate limit against the specific model/deployment, the gateway sets a cooldown timer on the google provider identifier.
Subsequent Request to gemini-3.0-pro-preview: The gateway checks if the google provider is in cooldown. Finding it is, it rejects the request preemptively with a backoff error.
Model with Independent Quota is Blocked: gemini-3.0-pro-preview may have a completely separate quota allocation, but cannot be accessed.

Code-Level Root Cause

The rate limit tracking likely uses a data structure similar to:

// Simplified representation of current behavior
const providerBackoff = {
  "google": {
    cooldownUntil: 1699999999999,  // Unix timestamp
    reason: "rate_limit",
    retryAfter: 120
  }
};

// Backoff check
function shouldReject(provider) {
  return providerBackoff[provider]?.cooldownUntil > Date.now();
}

The problem: The backoff is keyed by provider name (“google”) rather than by model or deployment identifier.

Google Gemini API Quota Architecture

Google Gemini API operates with:

Model-specific quotas: Each model (e.g., gemini-3.1-pro-preview-customtools) has independent rate limits.
Project-level quotas: Broader limits that affect all models, but these are typically much higher.
Regional endpoints: May have independent limits.

Divergent Code Paths

Scenario	Current Behavior	Expected Behavior
Model A hits 429	All `google` provider blocked	Only Model A blocked
Model A quota exhausted	Model B unusable	Model B continues if quota available
Provider backoff active	Gateway rejects at layer 7	Request proceeds to API

🛠️ Step-by-Step Fix

Option 1: Enable Model-Scoped Rate Limiting (Recommended)

If OpenClaw supports per-model rate limit tracking, configure the gateway to use model-level backoff:

Before (openclaw.yaml):

providers:
  google:
    api_key: "${GOOGLE_API_KEY}"
    rate_limit:
      strategy: "provider"  # Current: blocks entire provider
      retry_after: 120

After:

providers:
  google:
    api_key: "${GOOGLE_API_KEY}"
    rate_limit:
      strategy: "model"  # Changed: per-model tracking
      retry_after: 120
      scope: "deployment"  # Granularity: model/deployment level

Option 2: Configure Model-Specific Fallbacks

Define explicit fallback chains to bypass rate-limited models:

Before:

models:
  - name: "gemini-3.1-pro-preview-customtools"
    provider: "google"

After:

models:
  - name: "gemini-3.1-pro-preview-customtools"
    provider: "google"
    fallback_models:
      - "gemini-3.0-pro-preview"
      - "gemini-pro"

  - name: "gemini-3.0-pro-preview"
    provider: "google"
    fallback_models:
      - "gemini-pro"

Option 3: Increase Provider Cooldown Granularity (Code Fix)

If you have access to the OpenClaw source code, modify the rate limit tracking:

Step 1: Identify the rate limit handler

Locate the file handling 429 responses. Typically found at:

src/gateway/middleware/rate-limit-handler.ts
src/providers/google/error-handler.ts

Step 2: Modify backoff key from provider to model

// BEFORE (provider-level)
providerBackoff[provider] = {
  cooldownUntil: Date.now() + retryAfter * 1000,
  reason: "rate_limit"
};

// AFTER (model-level)
const modelKey = `${provider}:${model}`;
modelBackoff[modelKey] = {
  cooldownUntil: Date.now() + retryAfter * 1000,
  reason: "rate_limit",
  model: model,
  provider: provider
};

Step 3: Update the rejection check

// BEFORE
function shouldReject(request) {
  const provider = request.provider;
  return providerBackoff[provider]?.cooldownUntil > Date.now();
}

// AFTER
function shouldReject(request) {
  const modelKey = `${request.provider}:${request.model}`;
  const providerKey = request.provider;
  
  // Check model-specific backoff first
  if (modelBackoff[modelKey]?.cooldownUntil > Date.now()) {
    return { rejected: true, reason: "model_rate_limited" };
  }
  
  // Fallback to provider-level for shared limits only
  if (providerBackoff[providerKey]?.cooldownUntil > Date.now()) {
    return { rejected: true, reason: "provider_rate_limited" };
  }
  
  return { rejected: false };
}

Option 4: Workaround via Multiple Provider Instances

Create separate provider configurations for models with independent quotas:

providers:
  google-gemini-31:
    api_key: "${GOOGLE_API_KEY}"
    models:
      - "gemini-3.1-pro-preview-customtools"
    rate_limit:
      retry_after: 60

  google-gemini-30:
    api_key: "${GOOGLE_API_KEY}"
    models:
      - "gemini-3.0-pro-preview"
    rate_limit:
      retry_after: 60

  google-gemini-pro:
    api_key: "${GOOGLE_API_KEY}"
    models:
      - "gemini-pro"
    rate_limit:
      retry_after: 60

🧪 Verification

Test 1: Confirm Model-Level Isolation After Fix

# Step 1: Trigger rate limit on model A
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'

# Expected: 429 from Google API
# Verify with: echo $? (should be non-zero)

# Step 2: Immediately test model B access
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'

# Expected: 200 OK or valid API response (not gateway backoff error)

Test 2: Verify Model-Specific Backoff State

Check the gateway’s internal state (if exposed via admin endpoint):

GET /admin/rate-limit-status

# Expected response structure:
{
  "providers": {
    "google": {
      "cooldown": false,
      "models": {
        "gemini-3.1-pro-preview-customtools": {
          "cooldown": true,
          "retry_after": 120,
          "expires_at": "2024-01-15T10:30:00Z"
        },
        "gemini-3.0-pro-preview": {
          "cooldown": false
        }
      }
    }
  }
}

Test 3: Concurrent Model Availability Test

# Run concurrent requests to different models
for model in "gemini-3.1-pro-preview-customtools" "gemini-3.0-pro-preview" "gemini-pro"; do
  echo "Testing: $model"
  curl -s -o /dev/null -w "%{http_code}\n" \
    -X POST https://api.openclaw.io/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"$model\", \"messages\": [{\"role\": \"user\", \"content\": \"test\"}]}"
done

# Expected: 
# gemini-3.1-pro-preview-customtools: 429 (rate limited)
# gemini-3.0-pro-preview: 200 (independent quota)
# gemini-pro: 200 (independent quota)

Test 4: Backoff Expiration Verification

# Wait for cooldown to expire
echo "Waiting for model cooldown expiration..."
sleep 130  # retry_after + buffer

# Verify previously rate-limited model is accessible
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.1-pro-preview-customtools", "messages": [{"role": "user", "content": "test"}]}'

# Expected: 200 OK

Success Criteria

✅ After rate limit on gemini-3.1-pro-preview-customtools, other google models remain accessible.
✅ Model-specific backoff state is correctly tracked and expires independently.
✅ Gateway does not preemptively reject requests to non-rate-limited models.
✅ Fallback chains function correctly when primary model is unavailable.

⚠️ Common Pitfalls

Environment-Specific Traps

Docker Container Caching

# Pitfall: Container filesystem may cache rate limit state
# Restarting containers may not reset state if persistence is enabled

docker-compose down
docker volume prune openclaw-cache  # Clear cached state
docker-compose up -d

Kubernetes Volume Mounts

If using persistent volumes for rate limit tracking:

# Verify PVC is not stale after config changes
kubectl get pvc | grep openclaw
kubectl describe pvc openclaw-cache

# May need to delete and recreate if schema changed
kubectl delete pvc openclaw-cache
# Then restart deployments

macOS Development Environment

# Pitfall: Local rate limit state may persist across terminal sessions
# Clear any local state files
rm -rf ~/.openclaw/cache/*
rm -rf .openclaw/state.json

Configuration Mistakes

Incorrect Provider Name in Fallback Chain

# WRONG: Typos in provider name cause silent failures
models:
  - name: "gemini-3.0-pro-preview"
    provider: "googel"  # Typo - will not match actual provider

# CORRECT:
models:
  - name: "gemini-3.0-pro-preview"
    provider: "google"

Overlapping Model Declarations

# WRONG: Same model declared multiple times
models:
  - name: "gemini-3.0-pro-preview"
    provider: "google"
  - name: "gemini-3.0-pro-preview"  # Duplicate
    provider: "google"
    fallback_models: [...]

API Key Scope Mismatch

# Pitfall: Google API keys may have different quotas per project
# If using separate provider instances, ensure they use keys with adequate quotas

# Verify in Google Cloud Console:
# APIs & Services > Enabled APIs > Vertex AI API > Quotas

Testing Edge Cases

Rate Limit on Last Available Model

# Scenario: All models under a provider are rate-limited
# Expected: Should return clear error, not silent success

# Verify error response includes all affected models
curl -X POST https://api.openclaw.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-3.0-pro-preview", "messages": [{"role": "user", "content": "test"}]}'

# Check response contains actionable information
# Should NOT be empty 200 OK

Rapid Model Switching

# Pitfall: Race condition during rapid switching may bypass backoff
# Test with concurrent requests

ab -n 100 -c 10 -T 'application/json' \
  -p request.json \
  https://api.openclaw.io/v1/chat/completions

# Verify all requests are properly rate-limited or processed

Error Code	Description	Connection
`429 RESOURCE_EXHAUSTED`	Google API returned rate limit error	Source error triggering provider backoff
`503 Service Unavailable`	Provider temporarily unavailable	Downstream of prolonged provider backoff
`500 Internal Server Error`	Gateway error during backoff handling	Unhandled exception in rate limit middleware
`ENOTFOUND`	DNS resolution failure for Google API	Unrelated but may be misdiagnosed as rate limit
`ETIMEDOUT`	Connection timeout to Google API	Unrelated but may trigger incorrect backoff logic
`INVALID_ARGUMENT`	Malformed request to Gemini API	May be misrouted as rate limit in error handling

Historical Context

This issue relates to broader patterns in multi-tenant API gateway design:

Overly Broad Circuit Breakers: Applying circuit breaker patterns at provider level when they should operate at model/deployment level.
Shared State Collision: Multiple independent resources sharing a single rate limit counter.
Insufficient Error Context: 429 responses from Google include retryInfo that specifies which quota was exhausted, but this may not be parsed.

Rate limiting should be scoped per-model not per-provider - Feature request for model-level isolation
Google Gemini provider backoff blocks all models - Duplicate tracking issue
Add retry-after parsing from Google 429 responses - Enhancement for accurate cooldown calculation