Vapi Voice Agent Architecture¶

Overview¶

Vapi is a voice AI platform for building and deploying conversational voice agents. It handles the full voice pipeline: STT → LLM → TTS, with tool calling, transfer, and telephony integration.

Production Stack (Voice Controller AI)¶

Voice Controller AI runs ~50,000 calls/month on Vapi, primarily serving automotive shops. The stack:

Vapi: Voice pipeline orchestration
n8n: Workflow automation (webhooks, tool execution, CRM integration)
Vercel: Frontend hosting
Supabase: Database and auth

Key Optimization Vectors¶

The full-stack optimization space for voice agents includes:

STT (Speech-to-Text): Accuracy, latency, language support. Affects downstream LLM quality.
LLM (Language Model): Prompt engineering, model selection, response latency. The intelligence layer.
TTS (Text-to-Speech): Naturalness, latency, voice selection. The user-facing quality layer.
Prompt Design: System prompts, tool descriptions, conversation flow. Highest-leverage optimization.
Tool Calling: Latency, reliability, error handling. Determines agent capability.
Timing: Turn-taking, interruption handling, silence detection. Determines conversational feel.

Autoresearch Approach¶

The Voice Prompt AutoResearch (VPAR) project applies Karpathy's autoresearch loop to voice prompt optimization:

Automated prompt variant generation
Real Vapi call execution for evaluation
LLM-judge scoring of call quality
Iterative improvement based on results

Budget constraint: ~$3/cycle, with ~$20 total Vapi credits. At 50k calls/month, a 1% improvement = 500 better conversations.

Status (2026-04-04): VPAR is paused due to runaway charges (~$90 over 2 days). Pause enforcement gap identified — individual experiment scripts bypassed the pause toggle. Fix required before re-enabling.

Karpathy LLM Wiki Pattern — the autoresearch loop is itself a Karpathy concept

Vapi Voice Agent Architecture¶

Overview¶

Production Stack (Voice Controller AI)¶

Key Optimization Vectors¶

Autoresearch Approach¶

Related¶