🎙️ Real-Time AI Voice Assistant with OpenAI & Twilio

Wed 16 April 2025
Experiments
akuz

Recently, I built a real-time voice Q&A assistant for a working spa business — designed to answer calls, provide instant information, and send helpful follow-ups via SMS. It uses OpenAI’s GPT-4o Mini real-time API for language and voice capabilities, and integrates with Twilio for telephony and messaging.

It’s already running in production, handling real customers with real questions — and doing so in a calm, human-like tone that fits the wellness environment perfectly.

🧠 Why I Built It

I wanted to build something practical, fast, and voice-first — something that uses LLMs not just to answer questions, but to replace a real service role.

The goal was to create an assistant that can:

Understand natural speech without button menus or scripts
Provide accurate, domain-specific answers (e.g., types of massages, locations)
Send helpful SMS links for booking and contact info
Run reliably in the real world, not just a demo or a prototype

Because the assistant doesn't perform actions like booking or rescheduling — only retrieves and communicates information — it was surprisingly easy to build.

⏱️ Built in a Week

Using GPT-4o’s built-in multimodal voice capabilities and Twilio’s voice streaming, I got a working version live in about a week. The logic for dynamic prompts, SMS responses, and fallback handling was all implemented in Python with FastAPI.

By keeping the system stateless and focused on answering questions (not making decisions), I was able to build something reliable, responsive, and ready to ship — fast.

🛠️ System Overview

Incoming Call (Twilio)
    ↓
Voice Stream → GPT-4o Mini (handles input/output natively)
    ↓
Prompt + Context Injection (spa services, branches)
    ↓
LLM-generated speech streamed back to caller
    ↓
SMS follow-up with links or contact info (via Twilio)
    ↓
Parallel: OpenAI 4o-mini used for transcription logging

There’s no external transcription or TTS needed during the call, as GPT-4o handles both speech input and output directly. For internal logging and quality review, I use OpenAI's 4o-mini transcription model to keep a record of what users said.

⚙️ What It Can Do

Understand and answer service-related questions
Send booking links via SMS
Share contact info (address, phone, email) for any branch
Adapt to natural human conversation and interruption
Respond in multiple languages (via system prompt switching)

All of this without needing RAG, vector databases, or custom NLU pipelines — just intelligent prompt design and lightweight context injection.

🧪 What's Next

I'm particularly excited about the next phase of this assistant — integrating it with a live booking system. That’s where things get truly interactive: letting users ask “Do you have something open tomorrow afternoon?” and getting real answers, not just links.

Upcoming features include:

📆 Real-time booking availability
🧠 User memory & personalization
🔑 Voice-based user ID
📞 Sentiment-based human fallback

🔮 Reflections

It’s a fascinating time to be building with LLMs. What used to require a full engineering team can now be prototyped and deployed by a solo developer — in a week. And thanks to models like GPT-4o, voice interaction is finally smooth enough to feel natural, not awkward.

I’m proud that this assistant is already helping real people — answering their calls, pointing them in the right direction, and doing it all with clarity and calmness.

🧖‍♀️ Try it for yourself at Thy Spa website.

—

If you're building in this space or just curious about real-time AI interfaces, I’d love to connect and exchange ideas.