AI Infrastructure
The AI Infrastructure tab is where you set the workspace’s default model choices, voice providers, and failover behavior. Per-agent settings can override these defaults — but a clean workspace-level config means you don’t have to repeat yourself for every agent.
Defaults you set here
| Setting | Notes |
|---|---|
| Default chat model | The LLM new Standard agents start with. |
| Default realtime voice provider | OpenAI Realtime, Gemini Live, or Cascaded. |
| Default voice | The TTS voice for new Realtime agents. |
| Default fallback chain | Provider order for failover. |
| Default temperature | 0.0–1.0; affects creativity vs consistency. |
| Default thinking budget | For models that support extended thinking. |
Choosing a model
| Model | Strengths | Best for |
|---|---|---|
| GPT-4o (standard) | Strong reasoning, broad tool use | General-purpose chat agents |
| GPT-4o Realtime | Low-latency voice, good prosody | English voice support |
| Gemini Live | Multilingual, fast, cheap | High-volume multilingual voice |
| Claude Sonnet 4.6 | Long context, excellent following instructions | Complex chat with long histories |
| Claude Haiku 4.5 | Cheap, fast, surprisingly capable | High-volume chat triage |
You can configure per-task model routing — e.g. Haiku for first-turn triage, escalate to Sonnet if the conversation gets complex.
Pricing varies by model and provider. The AI Infrastructure dashboard shows cost-per-conversation by model so you can compare.
Failover chain
Picture the chain as an ordered list:
Primary: GPT-4o Realtime (OpenAI)
Fallback 1: Gemini Live (Google)
Fallback 2: Cascaded (GPT-4o + Azure TTS)If the primary returns a 5xx or times out, Omniflow walks down the chain. Within a single conversation, it sticks with the current provider once 3+ turns have happened — switching mid-call hurts coherence.
See Reliability & Failover for the per-agent override.
Voice provider configuration
Each voice provider has its own credentials and quirks:
| Provider | Auth |
|---|---|
| OpenAI Realtime | OpenAI API key with realtime access. |
| Gemini Live | Google Cloud project + Vertex AI access. |
| Cascaded TTS | API keys for STT (Whisper / Deepgram) and TTS (Azure / AWS Polly / ElevenLabs). |
Add credentials under Settings → Secrets (encrypted at rest) and reference them in the AI Infrastructure config.
Custom (BYO) models
For self-hosted models or third-party providers Omniflow doesn’t natively support:
- Configure an OpenAI-compatible endpoint URL.
- Provide auth (header or query param).
- Pick a “compatibility mode” — most providers ship an OpenAI-compatible API.
- Test with a sample conversation.
Custom models work with Standard mode; Realtime requires explicit support.
Cost controls
Workspace-level guardrails:
| Control | What it does |
|---|---|
| Per-conversation token cap | Hard limit on tokens per conversation. |
| Daily spend cap | Pause AI on overrun. |
| Per-tier budgets | Different caps for free vs paid customers. |
| Routing to cheaper models | Auto-route routine conversations to Haiku-class. |
Don’t set hard daily caps without alerts. Hitting the cap mid-shift means agents stop working until the cap resets — pair caps with notifications so someone knows before it bites.
Health and incidents
The Health panel shows real-time provider status:
- Uptime over last 24h.
- p50 / p95 / p99 latency.
- Tool error rate.
- Failover trigger count.
Subscribe to a Slack channel for sustained-failure alerts. Provider incidents show up here before they affect your customers — most failures are caught and routed via failover before agents notice.
Open in Omniflow
Related
| If you want to… | Go to |
|---|---|
| Tune voice per agent | Voice Models |
| Configure failover per agent | Reliability & Failover |
| Read the runtime architecture | Voice Runtime |