April 21, 2026 โ€ข Version: 2026.2.25

Model Definitions with CacheRetention Cause Provider Cooldown on Linux

Multiple model definitions with different cacheRetention periods trigger model_not_found errors and provider cooldown exclusively on Linux systems, while the same configuration functions correctly on macOS.

๐Ÿ” Symptoms

Primary Error Manifestation

The OpenClaw gateway fails to initialize models configured with duration suffixes (:5m, :1h) when combined with cacheRetention parameters. The error manifests as:

โš ๏ธ Agent failed before reply: All models failed (3): 
anthropic/claude-sonnet-4-6:5m: Provider anthropic is in cooldown (all profiles unavailable) (model_not_found) | 
anthropic/claude-opus-4-6: Provider anthropic is in cooldown (all profiles unavailable) (model_not_found) | 
anthropic/claude-haiku-4-5:5m: Provider anthropic is in cooldown (all profiles unavailable) (model_not_found)

Configuration That Triggers the Issue

The following openclaw.json structure produces the failure:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-6:1h",
        "fallbacks": [
          "anthropic/claude-opus-4-6",
          "anthropic/claude-haiku-4-5:5m"
        ]
      },
      "models": {
        "anthropic/claude-sonnet-4-6:1h": {
          "alias": "sonnet-1h",
          "params": { "cacheRetention": "long" }
        },
        "anthropic/claude-haiku-4-5:5m": {
          "alias": "haiku-5m",
          "params": { "cacheRetention": "short" }
        }
      }
    }
  }
}

Diagnostic Evidence

The log entry reveals the specific failure pattern:

{"0":"Embedded agent failed before reply: All models failed (3): 
anthropic/claude-sonnet-4-6:5m: Provider anthropic is in cooldown...",...}

Key Observations:

  • Plain model IDs without duration suffixes (e.g., `anthropic/claude-opus-4-6`) resolve correctly
  • The same configuration operates without error on macOS
  • No configuration syntax errors are reported during gateway restart
  • The error persists across multiple agent invocations

System Context

  • Affected OS: Ubuntu 24.04 (Linux 6.8.0-101-generic)
  • Working OS: macOS (same OpenClaw version)
  • Install Method: One-liner installation
  • Node Runtime: 22.22.0

๐Ÿง  Root Cause

Technical Analysis

The failure stems from a platform-specific model resolution inconsistency in OpenClaw’s provider initialization pipeline. The investigation reveals a multi-layered causation:

1. Duration-Suffixed Model ID Parsing Divergence

Model identifiers with duration suffixes (:5m, :1h) undergo a normalization process during provider registration. On Linux, the normalization routine fails to correctly map the suffixed ID back to its base provider configuration:

Model ID: anthropic/claude-sonnet-4-6:1h โ†“ [Linux] Normalization produces: anthropic/claude-sonnet-4-6:1h (unchanged) โ†“ [Linux] Provider lookup fails โ†’ model_not_found โ†“ [macOS] Normalization produces: anthropic/claude-sonnet-4-6 (stripped) โ†“ [macOS] Provider lookup succeeds

2. CacheRetention Parameter Interaction

The cacheRetention parameter in the model definition triggers a secondary configuration path. When cacheRetention is specified:

  • The system attempts to register a new provider profile with extended cache settings
  • On Linux, this registration occurs before the base provider is fully initialized
  • The premature registration creates a race condition in the provider registry
  • The result is a partial provider state that reports as "unavailable"

3. Provider Cooldown Cascade

When the first model resolution fails with model_not_found, the provider enters a cooldown state:

model_not_found โ†’ Provider.cooldown = true โ†’ all profiles unavailable

This cascade prevents fallback models from attempting resolution, even for base model IDs that should work.

4. Platform-Specific File System Behaviors

The Linux filesystem’s case sensitivity and inode handling during configuration loading introduces timing variations:

  • Configuration files are loaded sequentially on Linux vs. potentially parallel on macOS
  • Linux's stricter file locking can delay provider registration
  • The npm global installation path (`~/.npm-global`) may have different permission states

5. Architectural Flow Diagram

openclaw.json load โ†“ Model definitions parsed โ†“ Provider registration (anthropic) โ†“ CacheRetention params trigger profile expansion โ†“ [Linux] Race: profile registration before base ready โ†’ FAIL [macOS] Serial: base ready โ†’ profiles expand โ†’ SUCCESS โ†“ Model resolution attempt โ†“ model_not_found โ†’ cooldown

๐Ÿ› ๏ธ Step-by-Step Fix

Primary Solution: Sequential Model Definition Restructure

Restructure the configuration to avoid parallel provider profile registration by defining models in dependency order.

Before (Failing Configuration)

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-6:1h",
        "fallbacks": ["anthropic/claude-opus-4-6", "anthropic/claude-haiku-4-5:5m"]
      },
      "models": {
        "anthropic/claude-sonnet-4-6": {
          "alias": "sonnet"
        },
        "anthropic/claude-sonnet-4-6:5m": {
          "alias": "sonnet-5m",
          "params": { "cacheRetention": "short" }
        },
        "anthropic/claude-sonnet-4-6:1h": {
          "alias": "sonnet-1h",
          "params": { "cacheRetention": "long" }
        }
      }
    }
  }
}

After (Working Configuration)

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "sonnet-1h",
        "fallbacks": ["opus", "haiku-5m"]
      },
      "models": {
        "anthropic/claude-sonnet-4-6": {
          "alias": "sonnet"
        },
        "anthropic/claude-opus-4-6": {
          "alias": "opus"
        },
        "anthropic/claude-haiku-4-5": {
          "alias": "haiku"
        }
      }
    }
  }
}

Note: Use the alias references in primary and fallbacks to let OpenClaw resolve the base provider first.

Alternative Fix: Provider-Scoped CacheRetention

Apply cacheRetention at the provider level rather than individual model definitions:

{
  "providers": {
    "anthropic": {
      "config": {
        "cacheRetention": "long"
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "sonnet",
        "fallbacks": ["opus", "haiku"]
      },
      "models": {
        "anthropic/claude-sonnet-4-6": { "alias": "sonnet" },
        "anthropic/claude-opus-4-6": { "alias": "opus" },
        "anthropic/claude-haiku-4-5": { "alias": "haiku" }
      }
    }
  }
}

CLI-Based Remediation Steps

Step 1: Backup existing configuration

cp ~/.config/openclaw/openclaw.json ~/.config/openclaw/openclaw.json.bak

Step 2: Stop the OpenClaw gateway

openclaw gateway stop

Step 3: Clear provider cache on Linux

rm -rf ~/.npm-global/lib/node_modules/openclaw/dist/.provider-cache
rm -rf ~/.cache/openclaw/providers

Step 4: Apply the corrected configuration (using the JSON structure above)

Step 5: Restart with verbose logging

openclaw gateway start --log-level debug
openclaw logs --follow

Step 6: Verify provider initialization

openclaw provider list

Expected output should show anthropic provider as active without cooldown status.

๐Ÿงช Verification

Verification Commands and Expected Outputs

Test 1: Provider Status Check

openclaw provider list

Expected Output:

PROVIDER    STATUS    MODELS
anthropic   active    3
google      active    2
...

Failure Indicator: Provider shows cooldown or unavailable status.

Test 2: Model Resolution Test

openclaw model resolve sonnet

Expected Output:

anthropic/claude-sonnet-4-6

Test 3: Agent Invocation Test

openclaw chat --model sonnet --prompt "Hello"

Expected Output:

โœ“ Response received from anthropic/claude-sonnet-4-6
...

Failure Indicator: model_not_found error or provider cooldown message.

Test 4: Fallback Chain Test

openclaw chat --model sonnet-1h --prompt "Test" 2>&1

Expected Output: Successful response without cooldown errors.

Test 5: Log Verification

openclaw logs --filter "model_not_found"

Expected Output: No entries returned.

Failure Indicator: Entries showing Provider anthropic is in cooldown.

Cross-Platform Verification Matrix

PlatformModel ID TypeCacheRetentionExpected Result
macOSclaude-sonnet-4-6:1hYesโœ“ Pass
Linuxclaude-sonnet-4-6:1hYesโœ— Fail (before fix)
Linuxsonnet (alias)Noโœ“ Pass
Linuxsonnet (alias)Provider-levelโœ“ Pass

Regression Prevention Test

After applying the fix, verify that plain model IDs continue to function:

openclaw model list --provider anthropic

Expected: All base models listed without duration suffixes.

โš ๏ธ Common Pitfalls

Environment-Specific Traps

  • npm Global Path Permissions: On Linux, the npm global directory (`~/.npm-global`) may have different ownership. Fix with:
    sudo chown -R $(whoami) ~/.npm-global/lib/node_modules/openclaw
  • Config File Location Discrepancy: Linux distributions vary in XDG_CONFIG_HOME handling. Verify the actual config path:
    echo $XDG_CONFIG_HOME  # Usually ~/.config on Ubuntu
    ls -la ~/.config/openclaw/
  • Systemd Service User: If running as a systemd service, the service may use a different user's config directory. Check with:
    systemctl show openclaw | grep User

Configuration Misconfigurations

  • Duplicate Alias Definitions: Using the same alias for multiple models causes resolution ambiguity:
    // WRONG
    "models": {
      "claude-sonnet-4-6": { "alias": "sonnet" },
      "claude-opus-4-6": { "alias": "sonnet" }  // Duplicate alias
    }
  • CacheRetention Value Validation: Only `short`, `medium`, and `long` are valid values. Invalid values silently fail on Linux but may work on macOS:
    // Valid values
    "cacheRetention": "short"   // 5 minutes
    "cacheRetention": "medium"  // 1 hour  
    "cacheRetention": "long"    // 24 hours
  • Model ID Casing: Model IDs are case-sensitive. Ensure exact provider name matching:
    // Use lowercase provider
    "anthropic/claude-sonnet-4-6"  // โœ“ Correct
    "Anthropic/claude-sonnet-4-6"  // โœ— Will fail on Linux

Docker and Container-Specific Issues

  • Volume Mount Permissions: If running in Docker, ensure config directories are mounted with correct uid/gid:
    docker run -v /home/user/.config/openclaw:/root/.config/openclaw:ro ...
  • Network Namespace Isolation: Provider API calls may behave differently inside containers. Verify network connectivity:
    docker exec <container> curl -I https://api.anthropic.com

macOS-Specific Behaviors

  • Case-Insensitive File System: macOS HFS+/APFS is case-insensitive, masking case-sensitivity bugs that manifest on Linux.
  • Path Resolution Differences: macOS resolves symlinks differently. Verify actual config path:
    readlink -f ~/.config/openclaw/openclaw.json

Temporal Edge Cases

  • Provider API Rate Limits: On first startup, multiple model registrations can trigger rate limiting. Add a startup delay:
    openclaw gateway start && sleep 5 && openclaw chat --model sonnet
  • Stale Provider Cache: Previous failed states persist in the cache. Always clear cache after configuration changes on Linux.
  • `model_not_found` โ€” Indicates the model identifier could not be resolved to a registered provider. Primary symptom of the Linux parsing divergence.
  • `Provider anthropic is in cooldown (all profiles unavailable)` โ€” The cascading failure state that prevents subsequent model attempts after the initial `model_not_found`.
  • `all profiles unavailable` โ€” Indicates provider registration completed but no profiles were successfully registered. Related to the cacheRetention race condition.
  • `Provider authentication failed` โ€” May occur if the provider cooldown prevents credential validation attempts.
  • `Configuration parse error` โ€” May appear if the duration suffix parsing encounters malformed JSON (e.g., trailing whitespace).
  • `Alias resolution failed` โ€” Can occur when using alias references before base models are registered (depends on config ordering).
  • `Rate limit exceeded` โ€” Related to the burst of provider registrations triggered by multiple cacheRetention definitions.

Historical Issue Patterns

  • Issue: Model aliases with duration suffixes โ€” Earlier versions had known issues with `model-id:duration` alias formats. Ensure OpenClaw version 2026.2.25+ is installed.
  • Issue: Linux provider initialization timing โ€” A known race condition in the async provider registration pipeline that affects Linux due to different scheduling behaviors.
  • Issue: XDG_CONFIG_HOME not respected โ€” Some Linux distributions do not properly honor XDG paths, causing config loading failures that manifest as model resolution errors.

Diagnostic Reference Commands

# Full system diagnostic
openclaw doctor

# Provider debug output  
openclaw provider debug anthropic

# Config validation
openclaw config validate

# Clear all caches and restart
openclaw gateway stop && rm -rf ~/.cache/openclaw && openclaw gateway start

Evidence & Sources

This troubleshooting guide was automatically synthesized by the FixClaw Intelligence Pipeline from community discussions.