Model Definitions with CacheRetention Cause Provider Cooldown on Linux
Multiple model definitions with different cacheRetention periods trigger model_not_found errors and provider cooldown exclusively on Linux systems, while the same configuration functions correctly on macOS.
๐ Symptoms
Primary Error Manifestation
The OpenClaw gateway fails to initialize models configured with duration suffixes (:5m, :1h) when combined with cacheRetention parameters. The error manifests as:
โ ๏ธ Agent failed before reply: All models failed (3):
anthropic/claude-sonnet-4-6:5m: Provider anthropic is in cooldown (all profiles unavailable) (model_not_found) |
anthropic/claude-opus-4-6: Provider anthropic is in cooldown (all profiles unavailable) (model_not_found) |
anthropic/claude-haiku-4-5:5m: Provider anthropic is in cooldown (all profiles unavailable) (model_not_found)Configuration That Triggers the Issue
The following openclaw.json structure produces the failure:
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-6:1h",
"fallbacks": [
"anthropic/claude-opus-4-6",
"anthropic/claude-haiku-4-5:5m"
]
},
"models": {
"anthropic/claude-sonnet-4-6:1h": {
"alias": "sonnet-1h",
"params": { "cacheRetention": "long" }
},
"anthropic/claude-haiku-4-5:5m": {
"alias": "haiku-5m",
"params": { "cacheRetention": "short" }
}
}
}
}
}Diagnostic Evidence
The log entry reveals the specific failure pattern:
{"0":"Embedded agent failed before reply: All models failed (3):
anthropic/claude-sonnet-4-6:5m: Provider anthropic is in cooldown...",...}Key Observations:
- Plain model IDs without duration suffixes (e.g., `anthropic/claude-opus-4-6`) resolve correctly
- The same configuration operates without error on macOS
- No configuration syntax errors are reported during gateway restart
- The error persists across multiple agent invocations
System Context
- Affected OS: Ubuntu 24.04 (Linux 6.8.0-101-generic)
- Working OS: macOS (same OpenClaw version)
- Install Method: One-liner installation
- Node Runtime: 22.22.0
๐ง Root Cause
Technical Analysis
The failure stems from a platform-specific model resolution inconsistency in OpenClaw’s provider initialization pipeline. The investigation reveals a multi-layered causation:
1. Duration-Suffixed Model ID Parsing Divergence
Model identifiers with duration suffixes (:5m, :1h) undergo a normalization process during provider registration. On Linux, the normalization routine fails to correctly map the suffixed ID back to its base provider configuration:
Model ID: anthropic/claude-sonnet-4-6:1h โ [Linux] Normalization produces: anthropic/claude-sonnet-4-6:1h (unchanged) โ [Linux] Provider lookup fails โ model_not_found โ [macOS] Normalization produces: anthropic/claude-sonnet-4-6 (stripped) โ [macOS] Provider lookup succeeds
2. CacheRetention Parameter Interaction
The cacheRetention parameter in the model definition triggers a secondary configuration path. When cacheRetention is specified:
- The system attempts to register a new provider profile with extended cache settings
- On Linux, this registration occurs before the base provider is fully initialized
- The premature registration creates a race condition in the provider registry
- The result is a partial provider state that reports as "unavailable"
3. Provider Cooldown Cascade
When the first model resolution fails with model_not_found, the provider enters a cooldown state:
model_not_found โ Provider.cooldown = true โ all profiles unavailable
This cascade prevents fallback models from attempting resolution, even for base model IDs that should work.
4. Platform-Specific File System Behaviors
The Linux filesystem’s case sensitivity and inode handling during configuration loading introduces timing variations:
- Configuration files are loaded sequentially on Linux vs. potentially parallel on macOS
- Linux's stricter file locking can delay provider registration
- The npm global installation path (`~/.npm-global`) may have different permission states
5. Architectural Flow Diagram
openclaw.json load โ Model definitions parsed โ Provider registration (anthropic) โ CacheRetention params trigger profile expansion โ [Linux] Race: profile registration before base ready โ FAIL [macOS] Serial: base ready โ profiles expand โ SUCCESS โ Model resolution attempt โ model_not_found โ cooldown
๐ ๏ธ Step-by-Step Fix
Primary Solution: Sequential Model Definition Restructure
Restructure the configuration to avoid parallel provider profile registration by defining models in dependency order.
Before (Failing Configuration)
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-6:1h",
"fallbacks": ["anthropic/claude-opus-4-6", "anthropic/claude-haiku-4-5:5m"]
},
"models": {
"anthropic/claude-sonnet-4-6": {
"alias": "sonnet"
},
"anthropic/claude-sonnet-4-6:5m": {
"alias": "sonnet-5m",
"params": { "cacheRetention": "short" }
},
"anthropic/claude-sonnet-4-6:1h": {
"alias": "sonnet-1h",
"params": { "cacheRetention": "long" }
}
}
}
}
}After (Working Configuration)
{
"agents": {
"defaults": {
"model": {
"primary": "sonnet-1h",
"fallbacks": ["opus", "haiku-5m"]
},
"models": {
"anthropic/claude-sonnet-4-6": {
"alias": "sonnet"
},
"anthropic/claude-opus-4-6": {
"alias": "opus"
},
"anthropic/claude-haiku-4-5": {
"alias": "haiku"
}
}
}
}
}Note: Use the alias references in primary and fallbacks to let OpenClaw resolve the base provider first.
Alternative Fix: Provider-Scoped CacheRetention
Apply cacheRetention at the provider level rather than individual model definitions:
{
"providers": {
"anthropic": {
"config": {
"cacheRetention": "long"
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "sonnet",
"fallbacks": ["opus", "haiku"]
},
"models": {
"anthropic/claude-sonnet-4-6": { "alias": "sonnet" },
"anthropic/claude-opus-4-6": { "alias": "opus" },
"anthropic/claude-haiku-4-5": { "alias": "haiku" }
}
}
}
}CLI-Based Remediation Steps
Step 1: Backup existing configuration
cp ~/.config/openclaw/openclaw.json ~/.config/openclaw/openclaw.json.bakStep 2: Stop the OpenClaw gateway
openclaw gateway stopStep 3: Clear provider cache on Linux
rm -rf ~/.npm-global/lib/node_modules/openclaw/dist/.provider-cache
rm -rf ~/.cache/openclaw/providersStep 4: Apply the corrected configuration (using the JSON structure above)
Step 5: Restart with verbose logging
openclaw gateway start --log-level debug
openclaw logs --followStep 6: Verify provider initialization
openclaw provider listExpected output should show anthropic provider as active without cooldown status.
๐งช Verification
Verification Commands and Expected Outputs
Test 1: Provider Status Check
openclaw provider listExpected Output:
PROVIDER STATUS MODELS
anthropic active 3
google active 2
...Failure Indicator: Provider shows cooldown or unavailable status.
Test 2: Model Resolution Test
openclaw model resolve sonnetExpected Output:
anthropic/claude-sonnet-4-6Test 3: Agent Invocation Test
openclaw chat --model sonnet --prompt "Hello"Expected Output:
โ Response received from anthropic/claude-sonnet-4-6
...Failure Indicator: model_not_found error or provider cooldown message.
Test 4: Fallback Chain Test
openclaw chat --model sonnet-1h --prompt "Test" 2>&1Expected Output: Successful response without cooldown errors.
Test 5: Log Verification
openclaw logs --filter "model_not_found"Expected Output: No entries returned.
Failure Indicator: Entries showing Provider anthropic is in cooldown.
Cross-Platform Verification Matrix
| Platform | Model ID Type | CacheRetention | Expected Result |
|---|---|---|---|
| macOS | claude-sonnet-4-6:1h | Yes | โ Pass |
| Linux | claude-sonnet-4-6:1h | Yes | โ Fail (before fix) |
| Linux | sonnet (alias) | No | โ Pass |
| Linux | sonnet (alias) | Provider-level | โ Pass |
Regression Prevention Test
After applying the fix, verify that plain model IDs continue to function:
openclaw model list --provider anthropicExpected: All base models listed without duration suffixes.
โ ๏ธ Common Pitfalls
Environment-Specific Traps
- npm Global Path Permissions: On Linux, the npm global directory (`~/.npm-global`) may have different ownership. Fix with:
sudo chown -R $(whoami) ~/.npm-global/lib/node_modules/openclaw - Config File Location Discrepancy: Linux distributions vary in XDG_CONFIG_HOME handling. Verify the actual config path:
echo $XDG_CONFIG_HOME # Usually ~/.config on Ubuntu ls -la ~/.config/openclaw/ - Systemd Service User: If running as a systemd service, the service may use a different user's config directory. Check with:
systemctl show openclaw | grep User
Configuration Misconfigurations
- Duplicate Alias Definitions: Using the same alias for multiple models causes resolution ambiguity:
// WRONG "models": { "claude-sonnet-4-6": { "alias": "sonnet" }, "claude-opus-4-6": { "alias": "sonnet" } // Duplicate alias } - CacheRetention Value Validation: Only `short`, `medium`, and `long` are valid values. Invalid values silently fail on Linux but may work on macOS:
// Valid values "cacheRetention": "short" // 5 minutes "cacheRetention": "medium" // 1 hour "cacheRetention": "long" // 24 hours - Model ID Casing: Model IDs are case-sensitive. Ensure exact provider name matching:
// Use lowercase provider "anthropic/claude-sonnet-4-6" // โ Correct "Anthropic/claude-sonnet-4-6" // โ Will fail on Linux
Docker and Container-Specific Issues
- Volume Mount Permissions: If running in Docker, ensure config directories are mounted with correct uid/gid:
docker run -v /home/user/.config/openclaw:/root/.config/openclaw:ro ... - Network Namespace Isolation: Provider API calls may behave differently inside containers. Verify network connectivity:
docker exec <container> curl -I https://api.anthropic.com
macOS-Specific Behaviors
- Case-Insensitive File System: macOS HFS+/APFS is case-insensitive, masking case-sensitivity bugs that manifest on Linux.
- Path Resolution Differences: macOS resolves symlinks differently. Verify actual config path:
readlink -f ~/.config/openclaw/openclaw.json
Temporal Edge Cases
- Provider API Rate Limits: On first startup, multiple model registrations can trigger rate limiting. Add a startup delay:
openclaw gateway start && sleep 5 && openclaw chat --model sonnet - Stale Provider Cache: Previous failed states persist in the cache. Always clear cache after configuration changes on Linux.
๐ Related Errors
Directly Related Errors
- `model_not_found` โ Indicates the model identifier could not be resolved to a registered provider. Primary symptom of the Linux parsing divergence.
- `Provider anthropic is in cooldown (all profiles unavailable)` โ The cascading failure state that prevents subsequent model attempts after the initial `model_not_found`.
- `all profiles unavailable` โ Indicates provider registration completed but no profiles were successfully registered. Related to the cacheRetention race condition.
Contextually Related Errors
- `Provider authentication failed` โ May occur if the provider cooldown prevents credential validation attempts.
- `Configuration parse error` โ May appear if the duration suffix parsing encounters malformed JSON (e.g., trailing whitespace).
- `Alias resolution failed` โ Can occur when using alias references before base models are registered (depends on config ordering).
- `Rate limit exceeded` โ Related to the burst of provider registrations triggered by multiple cacheRetention definitions.
Historical Issue Patterns
- Issue: Model aliases with duration suffixes โ Earlier versions had known issues with `model-id:duration` alias formats. Ensure OpenClaw version 2026.2.25+ is installed.
- Issue: Linux provider initialization timing โ A known race condition in the async provider registration pipeline that affects Linux due to different scheduling behaviors.
- Issue: XDG_CONFIG_HOME not respected โ Some Linux distributions do not properly honor XDG paths, causing config loading failures that manifest as model resolution errors.
Diagnostic Reference Commands
# Full system diagnostic
openclaw doctor
# Provider debug output
openclaw provider debug anthropic
# Config validation
openclaw config validate
# Clear all caches and restart
openclaw gateway stop && rm -rf ~/.cache/openclaw && openclaw gateway start