Skip to Content
ReferenceVoice Runtime

Voice Runtime

Whether a customer dials in over Twilio, a trainee runs a practice call, or you click β€œtest” in the Studio, the voice path is the same. Three components: the browser (or telephony bridge), Omniflow’s Supabase backend, and a Railway-hosted runtime that owns the WebSocket.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Browser / β”‚ 1. session β”‚ Supabase Edge β”‚ β”‚ Twilio bridge │─────request───▢│ voice-runtime- β”‚ β”‚ β”‚ β”‚ session β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ 3. WSS + signed token β”‚ β”‚ β”‚ 2. signs JWT β”‚ β”‚ β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Railway Voice Runtime β”‚ β”‚ - verifies JWT (HMAC + exp) β”‚ β”‚ - calls /runtime-config to fetch agent config β”‚ β”‚ - streams audio bidirectionally β”‚ β”‚ - logs events to Supabase β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ 4. config + tool calls β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Supabase DB β”‚ β”‚ + edge fns β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Lifecycle of a call

  1. Session request. The browser (or telephony bridge) calls POST /functions/v1/voice-runtime-session with a Supabase JWT. Body: { agent_id, conversation_id?, test_mode }.
  2. Edge function signs a runtime token. It loads the agent config, creates a voice_sessions row, and returns a signed JWT (5-minute expiry) plus the Railway WebSocket URL.
  3. Browser connects to Railway at the returned URL with the signed token in the Sec-WebSocket-Protocol header.
  4. Railway verifies the token via HMAC (VOICE_RUNTIME_SIGNING_SECRET, shared between Supabase and Railway). Rejects if expired or signature invalid.
  5. Runtime calls /runtime-config to fetch the agent’s prompt, model, voice, knowledge, and tools.
  6. Audio streams bidirectionally over the WebSocket. The runtime calls model providers (OpenAI Realtime, Gemini Live), executes tools, and streams audio back to the client.
  7. Events are logged to call_events and call_logs in Supabase as the call progresses.
  8. On hangup, the session is finalized, transcript is saved, post-call hooks fire (analyze, score, generate insights).

Modes

ModeWhat it is
openai-realtimeNative OpenAI Realtime API. Lowest latency English.
gemini-liveGoogle Gemini Live. Strong multilingual.
cascadedSTT β†’ LLM β†’ TTS pipeline. Deterministic, slower, fully customizable.

Modes are picked per-agent and can be overridden at the conversation level (e.g. test calls always use cascaded for predictability).

JWT contract

The signed token contains:

{ "iss": "omniflow-voice-runtime", "sub": "session_8a2f", "agent_id": "ag_42", "tenant_id": "ws_demo", "conversation_id": "c_8a2f", "mode": "openai-realtime", "test_mode": false, "exp": 1730000000, "iat": 1729999700 }

Signed with HS256 using VOICE_RUNTIME_SIGNING_SECRET. The runtime rejects any token whose:

  • Signature doesn’t verify.
  • iss is wrong.
  • exp is in the past.
  • tenant_id doesn’t match the workspace it was deployed for.

Rotate VOICE_RUNTIME_SIGNING_SECRET quarterly. Rotation is a coordinated swap between Supabase and Railway β€” both ends must update simultaneously to avoid mid-call rejections.

Tool calls

Tools fire from the runtime to your business logic via:

  • Internal Omniflow actions β€” handled in the runtime itself.
  • Custom webhooks β€” runtime POSTs to your URL.
  • Supabase edge functions β€” runtime calls a workspace-scoped edge fn.

Tool latency budget is ~2s before the customer notices a hang. Long-running tools should return an acknowledgment immediately and emit a follow-up event.

Failover

If the primary provider returns 5xx or stalls, the runtime walks the configured fallback chain (see Reliability & Failover). Switching mid-call is avoided once 3+ turns have happened to preserve coherence.

When recording is enabled, the runtime saves audio to Supabase storage (encrypted at rest, signed URLs for playback). Consent prompts can be prepended to the agent’s first turn β€” see Telephony.

Observability

Every call has a trace in Activity Logs & Traces. Events include:

  • session.start / session.end
  • turn:user / turn:agent (transcript)
  • tool:call / tool:result
  • retrieval
  • transfer
  • disconnect
  • error

Use these to debug behavior, audit compliance, and feed QA scoring.

Configuration reference

VariableWherePurpose
VOICE_RUNTIME_WS_BASE_URLSupabase + RailwayWebSocket endpoint for clients.
VOICE_RUNTIME_SIGNING_SECRETSupabase + RailwayHMAC secret for JWTs.
VOICE_RUNTIME_BASE_URLSupabaseFallback HTTP base URL.
VOICE_RUNTIME_DEFAULT_MODEWorkspace settingDefault mode for new agents.

Open in Omniflow

If you want to…Go to
Pick a voiceVoice Models
Configure failoverReliability & Failover
Connect a phoneTelephony
Run training callsPractice Calls