Wiki Index¶

Content catalog for the LLM Wiki. Updated on every ingest.

Concepts¶

Prefix Cache Behavior in Hybrid Attention Models — why hybrid sliding-window models break prefix caching in mlx-lm
Local MLX Inference Patterns — running LLMs locally on Apple Silicon via MLX; production architecture with serializing proxy to prevent OOM kills
KV Cache Resumption and Context-Length Scaling on Apple Silicon — empirical benchmarks on Gemma 4 26B: context degradation curve, cache speedup (up to 15.8×), OOM limits, defense-in-depth memory safety, TurboQuant
Hermes Agent Fallback Chains — model failover configuration, triggers, and quirks
Karpathy LLM Wiki Pattern — the architecture behind this wiki: episodic vs. semantic memory for AI agents
Vapi Voice Agent Architecture — voice pipeline, production stack, optimization vectors
OpenClaw Agent Architecture — personal AI agent framework: gateway, sessions, memory, multi-agent
VNC Control — AI Desktop Bridge — visual control channel for AI agents: coordinate translation, macOS ARD quirks, vision backends, the tool-call loop bug

No source summaries yet — raw sources pending detailed ingest.

No entity pages yet.

No synthesis pages yet.

No comparison pages yet.