AI Infrastructure

The AI Infrastructure tab is where you set the workspace’s default model choices, voice providers, and failover behavior. Per-agent settings can override these defaults — but a clean workspace-level config means you don’t have to repeat yourself for every agent.

Defaults you set here

Setting	Notes
Default chat model	The LLM new Standard agents start with.
Default realtime voice provider	OpenAI Realtime, Gemini Live, or Cascaded.
Default voice	The TTS voice for new Realtime agents.
Default fallback chain	Provider order for failover.
Default temperature	0.0–1.0; affects creativity vs consistency.
Default thinking budget	For models that support extended thinking.

Choosing a model

Model	Strengths	Best for
GPT-4o (standard)	Strong reasoning, broad tool use	General-purpose chat agents
GPT-4o Realtime	Low-latency voice, good prosody	English voice support
Gemini Live	Multilingual, fast, cheap	High-volume multilingual voice
Claude Sonnet 4.6	Long context, excellent following instructions	Complex chat with long histories
Claude Haiku 4.5	Cheap, fast, surprisingly capable	High-volume chat triage

You can configure per-task model routing — e.g. Haiku for first-turn triage, escalate to Sonnet if the conversation gets complex.

Pricing varies by model and provider. The AI Infrastructure dashboard shows cost-per-conversation by model so you can compare.

Failover chain

Picture the chain as an ordered list:


Primary: GPT-4o Realtime (OpenAI)
Fallback 1: Gemini Live (Google)
Fallback 2: Cascaded (GPT-4o + Azure TTS)

If the primary returns a 5xx or times out, Omniflow walks down the chain. Within a single conversation, it sticks with the current provider once 3+ turns have happened — switching mid-call hurts coherence.

See Reliability & Failover for the per-agent override.

Voice provider configuration

Each voice provider has its own credentials and quirks:

Provider	Auth
OpenAI Realtime	OpenAI API key with realtime access.
Gemini Live	Google Cloud project + Vertex AI access.
Cascaded TTS	API keys for STT (Whisper / Deepgram) and TTS (Azure / AWS Polly / ElevenLabs).

Add credentials under Settings → Secrets (encrypted at rest) and reference them in the AI Infrastructure config.

Custom (BYO) models

For self-hosted models or third-party providers Omniflow doesn’t natively support:

Configure an OpenAI-compatible endpoint URL.
Provide auth (header or query param).
Pick a “compatibility mode” — most providers ship an OpenAI-compatible API.
Test with a sample conversation.

Custom models work with Standard mode; Realtime requires explicit support.

Cost controls

Workspace-level guardrails:

Control	What it does
Per-conversation token cap	Hard limit on tokens per conversation.
Daily spend cap	Pause AI on overrun.
Per-tier budgets	Different caps for free vs paid customers.
Routing to cheaper models	Auto-route routine conversations to Haiku-class.

Don’t set hard daily caps without alerts. Hitting the cap mid-shift means agents stop working until the cap resets — pair caps with notifications so someone knows before it bites.

Health and incidents

The Health panel shows real-time provider status:

Uptime over last 24h.
p50 / p95 / p99 latency.
Tool error rate.
Failover trigger count.

Subscribe to a Slack channel for sustained-failure alerts. Provider incidents show up here before they affect your customers — most failures are caught and routed via failover before agents notice.

Open in Omniflow

Open AI Infrastructure

If you want to…	Go to
Tune voice per agent	Voice Models
Configure failover per agent	Reliability & Failover
Read the runtime architecture	Voice Runtime