Voice Runtime
Whether a customer dials in over Twilio, a trainee runs a practice call, or you click βtestβ in the Studio, the voice path is the same. Three components: the browser (or telephony bridge), Omniflowβs Supabase backend, and a Railway-hosted runtime that owns the WebSocket.
Architecture
ββββββββββββββββββββ ββββββββββββββββββββββ
β Browser / β 1. session β Supabase Edge β
β Twilio bridge ββββββrequestββββΆβ voice-runtime- β
β β β session β
ββββββββββ¬ββββββββββ βββββββββββ¬βββββββββββ
β 3. WSS + signed token β
β β 2. signs JWT
β β
βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Railway Voice Runtime β
β - verifies JWT (HMAC + exp) β
β - calls /runtime-config to fetch agent config β
β - streams audio bidirectionally β
β - logs events to Supabase β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β 4. config + tool calls
βΌ
ββββββββββββββββββββ
β Supabase DB β
β + edge fns β
ββββββββββββββββββββLifecycle of a call
- Session request. The browser (or telephony bridge) calls
POST /functions/v1/voice-runtime-sessionwith a Supabase JWT. Body:{ agent_id, conversation_id?, test_mode }. - Edge function signs a runtime token. It loads the agent config, creates a
voice_sessionsrow, and returns a signed JWT (5-minute expiry) plus the Railway WebSocket URL. - Browser connects to Railway at the returned URL with the signed token in the
Sec-WebSocket-Protocolheader. - Railway verifies the token via HMAC (
VOICE_RUNTIME_SIGNING_SECRET, shared between Supabase and Railway). Rejects if expired or signature invalid. - Runtime calls
/runtime-configto fetch the agentβs prompt, model, voice, knowledge, and tools. - Audio streams bidirectionally over the WebSocket. The runtime calls model providers (OpenAI Realtime, Gemini Live), executes tools, and streams audio back to the client.
- Events are logged to
call_eventsandcall_logsin Supabase as the call progresses. - On hangup, the session is finalized, transcript is saved, post-call hooks fire (analyze, score, generate insights).
Modes
| Mode | What it is |
|---|---|
openai-realtime | Native OpenAI Realtime API. Lowest latency English. |
gemini-live | Google Gemini Live. Strong multilingual. |
cascaded | STT β LLM β TTS pipeline. Deterministic, slower, fully customizable. |
Modes are picked per-agent and can be overridden at the conversation level (e.g. test calls always use cascaded for predictability).
JWT contract
The signed token contains:
{
"iss": "omniflow-voice-runtime",
"sub": "session_8a2f",
"agent_id": "ag_42",
"tenant_id": "ws_demo",
"conversation_id": "c_8a2f",
"mode": "openai-realtime",
"test_mode": false,
"exp": 1730000000,
"iat": 1729999700
}Signed with HS256 using VOICE_RUNTIME_SIGNING_SECRET. The runtime rejects any token whose:
- Signature doesnβt verify.
issis wrong.expis in the past.tenant_iddoesnβt match the workspace it was deployed for.
Rotate VOICE_RUNTIME_SIGNING_SECRET quarterly. Rotation is a coordinated swap between Supabase and Railway β both ends must update simultaneously to avoid mid-call rejections.
Tool calls
Tools fire from the runtime to your business logic via:
- Internal Omniflow actions β handled in the runtime itself.
- Custom webhooks β runtime POSTs to your URL.
- Supabase edge functions β runtime calls a workspace-scoped edge fn.
Tool latency budget is ~2s before the customer notices a hang. Long-running tools should return an acknowledgment immediately and emit a follow-up event.
Failover
If the primary provider returns 5xx or stalls, the runtime walks the configured fallback chain (see Reliability & Failover). Switching mid-call is avoided once 3+ turns have happened to preserve coherence.
Recording and consent
When recording is enabled, the runtime saves audio to Supabase storage (encrypted at rest, signed URLs for playback). Consent prompts can be prepended to the agentβs first turn β see Telephony.
Observability
Every call has a trace in Activity Logs & Traces. Events include:
session.start/session.endturn:user/turn:agent(transcript)tool:call/tool:resultretrievaltransferdisconnecterror
Use these to debug behavior, audit compliance, and feed QA scoring.
Configuration reference
| Variable | Where | Purpose |
|---|---|---|
VOICE_RUNTIME_WS_BASE_URL | Supabase + Railway | WebSocket endpoint for clients. |
VOICE_RUNTIME_SIGNING_SECRET | Supabase + Railway | HMAC secret for JWTs. |
VOICE_RUNTIME_BASE_URL | Supabase | Fallback HTTP base URL. |
VOICE_RUNTIME_DEFAULT_MODE | Workspace setting | Default mode for new agents. |
Open in Omniflow
Related
| If you want to⦠| Go to |
|---|---|
| Pick a voice | Voice Models |
| Configure failover | Reliability & Failover |
| Connect a phone | Telephony |
| Run training calls | Practice Calls |