8 min read

The Voice AI Platform Landscape — What a Builder Actually Compared

I recently watched an experienced voice AI builder walk through the current platform landscape. Not a marketing video. Not a sales pitch. A builder sitting down with four platforms and asking the only question that matters: which one works?

He compared Vapi, Bland, Dogora, and Retell. Built a working agent on each. Tested the developer experience. Logged what broke.

Here's what I learned — and why I chose Vapi for my own deployment stack.

The Four Contenders

Platform Strengths Weaknesses Best For
Vapi Fastest to deploy. Docker support. Clean API. Deep tool integration. OAuth login — no API key copy-paste. No HIPAA BAA yet. Call minutes add up at scale. Transcripts need HITL review for medical terminology. Builders who want speed over customization. My pick for Module 2.
Bland Mature product. Good documentation. Solid marketing. Closed ecosystem. Less developer control. Pricing less transparent. Teams that want a managed solution, not a builder's toolkit.
Dogora Open-source. Full code control. Docker-native. Can inspect and modify everything. Newer. Smaller community. Less polished. You own the infrastructure. Compliance-heavy deployments. Healthcare. Legal. Anywhere "we must own the code" applies.
Retell Enterprise voice quality. Advanced speech models. More expensive. Slower onboarding. Enterprise sales motion — not self-serve. Call centers that need premium voice quality and have budget.

Why I Picked Vapi

Three reasons, none of them about voice quality.

1. MCP-native. Vapi ships an MCP server as an npm package. I installed it on my VPS, configured it in Hermes' config.yaml, and had 13 tools available — create_assistant, create_call, list_calls, list_phone_numbers — without writing a single API wrapper. The integration took nine minutes.

2. Docker deployment. The Vapi MCP server runs as an npx command. No containers to manage. No infrastructure to provision. The voice infrastructure is Vapi's problem. The orchestration is Hermes' problem. I own exactly what I need to own and nothing else.

3. Dynamic variables. Vapi supports {{customerName}}, {{appointmentDate}}, {{doctorName}} in assistant prompts — passed per call via Hermes. A single assistant template serves 10,000 patients. Each call feels personal because Hermes injects the context.

Where Vapi Is Weak

Every platform has gaps. Here's what's real, not what the landing page says:

No HIPAA BAA. This is the hard constraint. For hospital deployments handling PHI, I cannot use Vapi for calls where protected health information is exchanged. Appointment reminders with name + date + time? Borderline — many hospitals consider appointment details PHI. My solution: Vapi for general outreach (community health events, satisfaction surveys, non-clinical follow-ups). Twilio SMS for PHI-safe channels. And if the client needs voice + PHI? I fall back to Dogora, which runs on their infrastructure under their BAA.

Transcript accuracy. Vapi uses standard speech-to-text. Medical terminology, legal jargon, pharmacy names — these will not transcribe perfectly. My fix: every call transcript gets a HITL review flag in the audit trail. The AI treats its own transcript as a draft, not a document of record.

Cost at scale. At ~$0.07/minute, 1,000 monthly reminder calls (3 min each) runs ~$210/month. Not ruinous, but not free. The cost must be tracked against ROI — how many no-shows did those calls prevent? I wire Vapi call costs into my observability layer so the client knows exactly what they're paying for.

What I'd Switch To

If Vapi's weaknesses ever become deal-breakers for a specific client:

For code control: Dogora. Open-source. Self-hosted. Docker-native. The client owns the voice pipeline end-to-end. Compliance teams love this. The trade: I manage the infrastructure instead of Vapi. That cost gets priced into the deployment.

For enterprise voice quality: Retell. Better speech models. Premium voice quality. But it's an enterprise sales motion — contracts, onboarding calls, minimum commitments. I'd only recommend this for call centers with volume and budget.

For simplicity: Bland. It's fine. It works. If a client just wants a voice bot and doesn't care about how it's built, Bland is the safe choice. But "safe" is not the same as "best for integration." I build systems that talk to each other. Vapi talks to Hermes natively. Bland would need middleware.

The Principle

The video's real lesson isn't "Vapi is the best platform." It's the platform decision is secondary to the integration decision.

A voice agent that can't read the CRM. That can't log to the audit trail. That can't pass context between SMS, email, and phone — that agent is a toy, regardless of which platform hosts it. The platform is the voice layer. The orchestration is the value.

I don't pick a voice platform from a landing page. I pick it by asking: can Hermes control it? Can the audit trail see it? Can a human stop it? Vapi answers all three. If next year's Vapi answers none, I'll switch. The orchestration stays. The platform is replaceable.

That's the kind of research I do before deploying Module 2. You don't pick a platform. You pick an architecture. The platform is just a component.


John Bianchina builds AI implementation systems for hospitality, healthcare, and professional services. His current stack includes Hermes (concierge orchestration), Paperclip (multi-agent management), and Agent Zero (autonomous research). He operates from South Africa and serves clients internationally. More about his work →