Hermes Agent Fallback Chains¶

Overview¶

Hermes (NousResearch) supports automatic model fallback when the primary provider fails. This enables resilience across multiple providers (cloud APIs, local models).

Configuration¶

Documented format uses fallback_model: (singular dict, not list):

model:
  default: gpt-5.3-codex
  provider: openai-codex

fallback_model:
  provider: custom
  model: some-local-model

The codebase also supports fallback_providers: (list format), but the documented approach is fallback_model.

Trigger Conditions¶

Fallback activates on: - HTTP errors: 429 (rate limit), 500, 502, 503 — after retries exhausted - Auth errors: 401, 403, 404 — immediately (no retries) - Connection failures: httpx ReadTimeout, ConnectTimeout, PoolTimeout, ConnectError, RemoteProtocolError - Empty/malformed responses: eager fallback (skips retry backoff) - SSE stream drops: detected via phrase matching in APIError messages

Fallback does NOT trigger on: - Slowness alone (no timeout-based fallback in vanilla Hermes) - Model quality issues (wrong/bad answers)

Important Limitations¶

One-shot only: fires at most once per session
Works in: CLI and gateway modes
Does NOT work in: subagents or cron
No skip-current logic: if the fallback list includes the same model as primary, it won't skip it

Timeout Configuration¶

HERMES_API_TIMEOUT env var controls LLM API call timeout (default: 900 seconds / 15 minutes). An httpx ReadTimeout at this boundary will trigger retries, then fallback.

HERMES_STREAM_READ_TIMEOUT controls per-chunk stream read timeout (default: 60 seconds).

Anthropic Provider Quirk¶

Hermes _is_oauth_token() classifies all non-sk-ant-api-prefixed keys as OAuth and sends them as Bearer tokens. Anthropic API now rejects Bearer auth ("OAuth authentication is currently not supported"). Keys with prefix sk-ant-oat01- (which are valid API keys) get misrouted. The priority chain for Anthropic token resolution is:

ANTHROPIC_TOKEN env var
CLAUDE_CODE_OAUTH_TOKEN env var
Claude Code credentials files (~/.claude.json or ~/.claude/.credentials.json) with auto-refresh
ANTHROPIC_API_KEY env var

Expired OAuth tokens from priority 2-3 can shadow a valid API key at priority 4.

Practical Guidance¶

Use cloud models as primary, local as fallback (not the reverse) — local models choke on session compression at startup
Always route local model traffic through a serializing proxy (see Local MLX Inference Patterns) — raw MLX servers have no memory safeguards and concurrent requests cause OOM kills
Set HERMES_API_TIMEOUT to something reasonable (60-120s) if using local models as fallback to avoid 15-minute hangs
Don't put the same model in both primary and fallback positions
For Anthropic: ensure no stale Claude Code credentials exist, or set ANTHROPIC_TOKEN explicitly (priority 1)