Voice AI Agent Platform
Building a production voice agent means solving dozens of problems beyond the demo: prompt reliability, tool orchestration, context management, testing, and deployment. Most teams discover this after they've already started.
The Challenge
Getting a voice AI demo working takes an afternoon. Getting a voice AI agent into production (one that handles real calls, uses real tools, and doesn’t hallucinate) takes months of engineering that most teams don’t see coming.
I learned this firsthand building a facilities voice agent that handles calls across 400+ locations. The demo worked in a week. The next three months were spent solving the problems that only surface in production: the agent calling the wrong tool, hallucinating repair procedures, fumbling handoffs between conversation phases, and drowning in context when calls got complex.
I took everything I learned and built it into a reusable platform.
The Approach
The platform is built on Vapi for voice infrastructure, with a custom layer on top that handles everything Vapi doesn’t: agent definition, prompt management, tool orchestration, context management, and observability.
Agent as Code
Every aspect of a voice agent lives in version control:
- Agent configuration — prompt, voice selection, model settings, and Vapi parameters stored in config files that sync to Vapi on deploy
- Tool definitions — MCP server connections and tool schemas tracked alongside the agent config
- Prompt versioning — every prompt change is a commit, with the ability to diff, revert, and trace which version was live for any given call
This means deploying a new agent or updating an existing one is a git push, not a manual edit in a dashboard.
Context Management
A voice call isn’t a chat conversation. The caller is impatient, the agent has limited time to think, and every unnecessary token in the context window increases latency and reduces accuracy.
The platform manages what the agent knows and when:
- Staged context loading — the agent starts with minimal context (caller identity, location), and pulls in additional data (equipment details, repair procedures, work order history) only when the conversation needs it
- Tool result summarization — raw API responses are condensed before entering the conversation context, keeping the agent focused on what matters
- Conversation phase tracking — the agent knows whether it’s identifying equipment, troubleshooting, or creating a work order, and its available tools and context shift accordingly
Prompt Engineering for Voice
Writing a prompt for a voice agent is fundamentally different from writing one for a chatbot. The agent can’t show a bulleted list or ask the caller to “see below.” It needs to be concise, conversational, and decisive.
The hardest problems were:
- Tool selection reliability — getting the agent to consistently use the right tool at the right time, rather than guessing at an answer
- Hallucination prevention — the agent must say “I don’t know” rather than invent a repair procedure
- Conversation flow — natural transitions between phases without awkward pauses or abrupt topic changes
Early iterations used a multi-agent architecture, but improvements in model capability made a single-agent approach more reliable and lower latency.
Observability
When a call goes wrong, you need to know why. The platform integrates Vapi call logs, Vercel server traces, and Datadog monitoring through MCP servers, meaning I can use AI tooling to investigate call failures across the entire stack, from voice transcription through tool execution to API responses.
The Results
The platform reduces the time to deploy a new voice agent from months to days, assuming the underlying data integrations (MCP servers, APIs) are in place. The agent-as-code approach means every change is tracked, testable, and reversible.
The context management and prompt engineering patterns are the real value. They’re the difference between a voice agent that works in a demo and one that handles real calls reliably: staying focused, using the right tools, and knowing when to say “I don’t know.”
Services: AI Agent Development · MCP Server & Integration Development