Hermes Agent Fallback Chains¶
Overview¶
Hermes (NousResearch) supports automatic model fallback when the primary provider fails. This enables resilience across multiple providers (cloud APIs, local models).
Configuration¶
Documented format uses fallback_model: (singular dict, not list):
model:
default: gpt-5.3-codex
provider: openai-codex
fallback_model:
provider: custom
model: some-local-model
The codebase also supports fallback_providers: (list format), but the documented approach is fallback_model.
Trigger Conditions¶
Fallback activates on: - HTTP errors: 429 (rate limit), 500, 502, 503 — after retries exhausted - Auth errors: 401, 403, 404 — immediately (no retries) - Connection failures: httpx ReadTimeout, ConnectTimeout, PoolTimeout, ConnectError, RemoteProtocolError - Empty/malformed responses: eager fallback (skips retry backoff) - SSE stream drops: detected via phrase matching in APIError messages
Fallback does NOT trigger on: - Slowness alone (no timeout-based fallback in vanilla Hermes) - Model quality issues (wrong/bad answers)
Important Limitations¶
- One-shot only: fires at most once per session
- Works in: CLI and gateway modes
- Does NOT work in: subagents or cron
- No skip-current logic: if the fallback list includes the same model as primary, it won't skip it
Timeout Configuration¶
HERMES_API_TIMEOUT env var controls LLM API call timeout (default: 900 seconds / 15 minutes). An httpx ReadTimeout at this boundary will trigger retries, then fallback.
HERMES_STREAM_READ_TIMEOUT controls per-chunk stream read timeout (default: 60 seconds).
Anthropic Provider Quirk¶
Hermes _is_oauth_token() classifies all non-sk-ant-api-prefixed keys as OAuth and sends them as Bearer tokens. Anthropic API now rejects Bearer auth ("OAuth authentication is currently not supported"). Keys with prefix sk-ant-oat01- (which are valid API keys) get misrouted. The priority chain for Anthropic token resolution is:
ANTHROPIC_TOKENenv varCLAUDE_CODE_OAUTH_TOKENenv var- Claude Code credentials files (
~/.claude.jsonor~/.claude/.credentials.json) with auto-refresh ANTHROPIC_API_KEYenv var
Expired OAuth tokens from priority 2-3 can shadow a valid API key at priority 4.
Practical Guidance¶
- Use cloud models as primary, local as fallback (not the reverse) — local models choke on session compression at startup
- Always route local model traffic through a serializing proxy (see Local MLX Inference Patterns) — raw MLX servers have no memory safeguards and concurrent requests cause OOM kills
- Set
HERMES_API_TIMEOUTto something reasonable (60-120s) if using local models as fallback to avoid 15-minute hangs - Don't put the same model in both primary and fallback positions
- For Anthropic: ensure no stale Claude Code credentials exist, or set
ANTHROPIC_TOKENexplicitly (priority 1)