Skip to content

Hermes Agent Fallback Chains

Overview

Hermes (NousResearch) supports automatic model fallback when the primary provider fails. This enables resilience across multiple providers (cloud APIs, local models).

Configuration

Documented format uses fallback_model: (singular dict, not list):

model:
  default: gpt-5.3-codex
  provider: openai-codex

fallback_model:
  provider: custom
  model: some-local-model

The codebase also supports fallback_providers: (list format), but the documented approach is fallback_model.

Trigger Conditions

Fallback activates on: - HTTP errors: 429 (rate limit), 500, 502, 503 — after retries exhausted - Auth errors: 401, 403, 404 — immediately (no retries) - Connection failures: httpx ReadTimeout, ConnectTimeout, PoolTimeout, ConnectError, RemoteProtocolError - Empty/malformed responses: eager fallback (skips retry backoff) - SSE stream drops: detected via phrase matching in APIError messages

Fallback does NOT trigger on: - Slowness alone (no timeout-based fallback in vanilla Hermes) - Model quality issues (wrong/bad answers)

Important Limitations

  • One-shot only: fires at most once per session
  • Works in: CLI and gateway modes
  • Does NOT work in: subagents or cron
  • No skip-current logic: if the fallback list includes the same model as primary, it won't skip it

Timeout Configuration

HERMES_API_TIMEOUT env var controls LLM API call timeout (default: 900 seconds / 15 minutes). An httpx ReadTimeout at this boundary will trigger retries, then fallback.

HERMES_STREAM_READ_TIMEOUT controls per-chunk stream read timeout (default: 60 seconds).

Anthropic Provider Quirk

Hermes _is_oauth_token() classifies all non-sk-ant-api-prefixed keys as OAuth and sends them as Bearer tokens. Anthropic API now rejects Bearer auth ("OAuth authentication is currently not supported"). Keys with prefix sk-ant-oat01- (which are valid API keys) get misrouted. The priority chain for Anthropic token resolution is:

  1. ANTHROPIC_TOKEN env var
  2. CLAUDE_CODE_OAUTH_TOKEN env var
  3. Claude Code credentials files (~/.claude.json or ~/.claude/.credentials.json) with auto-refresh
  4. ANTHROPIC_API_KEY env var

Expired OAuth tokens from priority 2-3 can shadow a valid API key at priority 4.

Practical Guidance

  • Use cloud models as primary, local as fallback (not the reverse) — local models choke on session compression at startup
  • Always route local model traffic through a serializing proxy (see Local MLX Inference Patterns) — raw MLX servers have no memory safeguards and concurrent requests cause OOM kills
  • Set HERMES_API_TIMEOUT to something reasonable (60-120s) if using local models as fallback to avoid 15-minute hangs
  • Don't put the same model in both primary and fallback positions
  • For Anthropic: ensure no stale Claude Code credentials exist, or set ANTHROPIC_TOKEN explicitly (priority 1)