Recently, I built a real-time voice Q&A assistant for a working spa business — designed to answer calls, provide instant information, and send helpful follow-ups via SMS. It uses OpenAI’s GPT-4o Mini real-time API for language and voice capabilities, and integrates with Twilio for telephony and messaging.
It’s already running in production, handling real customers with real questions — and doing so in a calm, human-like tone that fits the wellness environment perfectly.
🧠 Why I Built It
I wanted to build something practical, fast, and voice-first — something that uses LLMs not just to answer questions, but to replace a real service role.
The goal was to create an assistant that can:
- Understand natural speech without button menus or scripts
- Provide accurate, domain-specific answers (e.g., types of massages, locations)
- Send helpful SMS links for booking and contact info
- Run reliably in the real world, not just a demo or a prototype
Because the assistant doesn't perform actions like booking or rescheduling — only retrieves and communicates information — it was surprisingly easy to build.
⏱️ Built in a Week
Using GPT-4o’s built-in multimodal voice capabilities and Twilio’s voice streaming, I got a working version live in about a week. The logic for dynamic prompts, SMS responses, and fallback handling was all implemented in Python with FastAPI.
By keeping the system stateless and focused on answering questions (not making decisions), I was able to build something reliable, responsive, and ready to ship — fast.
🛠️ System Overview
Incoming Call (Twilio)
↓
Voice Stream → GPT-4o Mini (handles input/output natively)
↓
Prompt + Context Injection (spa services, branches)
↓
LLM-generated speech streamed back to caller
↓
SMS follow-up with links or contact info (via Twilio)
↓
Parallel: OpenAI 4o-mini used for transcription logging
There’s no external transcription or TTS needed during the call, as GPT-4o handles both speech input and output directly. For internal logging and quality review, I use OpenAI's 4o-mini transcription model to keep a record of what users said.
⚙️ What It Can Do
- Understand and answer service-related questions
- Send booking links via SMS
- Share contact info (address, phone, email) for any branch
- Adapt to natural human conversation and interruption
- Respond in multiple languages (via system prompt switching)
All of this without needing RAG, vector databases, or custom NLU pipelines — just intelligent prompt design and lightweight context injection.
🧪 What's Next
I'm particularly excited about the next phase of this assistant — integrating it with a live booking system. That’s where things get truly interactive: letting users ask “Do you have something open tomorrow afternoon?” and getting real answers, not just links.
Upcoming features include:
- 📆 Real-time booking availability
- 🧠 User memory & personalization
- 🔑 Voice-based user ID
- 📞 Sentiment-based human fallback
🔮 Reflections
It’s a fascinating time to be building with LLMs. What used to require a full engineering team can now be prototyped and deployed by a solo developer — in a week. And thanks to models like GPT-4o, voice interaction is finally smooth enough to feel natural, not awkward.
I’m proud that this assistant is already helping real people — answering their calls, pointing them in the right direction, and doing it all with clarity and calmness.
🧖♀️ Try it for yourself at Thy Spa website.
—
If you're building in this space or just curious about real-time AI interfaces, I’d love to connect and exchange ideas.