Wiki Index¶
Content catalog for the LLM Wiki. Updated on every ingest.
Concepts¶
- Prefix Cache Behavior in Hybrid Attention Models — why hybrid sliding-window models break prefix caching in mlx-lm
- Local MLX Inference Patterns — running LLMs locally on Apple Silicon via MLX; production architecture with serializing proxy to prevent OOM kills
- KV Cache Resumption and Context-Length Scaling on Apple Silicon — empirical benchmarks on Gemma 4 26B: context degradation curve, cache speedup (up to 15.8×), OOM limits, defense-in-depth memory safety, TurboQuant
- Hermes Agent Fallback Chains — model failover configuration, triggers, and quirks
- Karpathy LLM Wiki Pattern — the architecture behind this wiki: episodic vs. semantic memory for AI agents
- Vapi Voice Agent Architecture — voice pipeline, production stack, optimization vectors
- OpenClaw Agent Architecture — personal AI agent framework: gateway, sessions, memory, multi-agent
- VNC Control — AI Desktop Bridge — visual control channel for AI agents: coordinate translation, macOS ARD quirks, vision backends, the tool-call loop bug
Source Summaries¶
No source summaries yet — raw sources pending detailed ingest.
Entities¶
No entity pages yet.
Syntheses¶
No synthesis pages yet.
Comparisons¶
No comparison pages yet.