Skip to Content
Settings & AdminAI Infrastructure

AI Infrastructure

The AI Infrastructure tab is where you set the workspace’s default model choices, voice providers, and failover behavior. Per-agent settings can override these defaults — but a clean workspace-level config means you don’t have to repeat yourself for every agent.

Defaults you set here

SettingNotes
Default chat modelThe LLM new Standard agents start with.
Default realtime voice providerOpenAI Realtime, Gemini Live, or Cascaded.
Default voiceThe TTS voice for new Realtime agents.
Default fallback chainProvider order for failover.
Default temperature0.0–1.0; affects creativity vs consistency.
Default thinking budgetFor models that support extended thinking.

Choosing a model

ModelStrengthsBest for
GPT-4o (standard)Strong reasoning, broad tool useGeneral-purpose chat agents
GPT-4o RealtimeLow-latency voice, good prosodyEnglish voice support
Gemini LiveMultilingual, fast, cheapHigh-volume multilingual voice
Claude Sonnet 4.6Long context, excellent following instructionsComplex chat with long histories
Claude Haiku 4.5Cheap, fast, surprisingly capableHigh-volume chat triage

You can configure per-task model routing — e.g. Haiku for first-turn triage, escalate to Sonnet if the conversation gets complex.

Pricing varies by model and provider. The AI Infrastructure dashboard shows cost-per-conversation by model so you can compare.

Failover chain

Picture the chain as an ordered list:

Primary: GPT-4o Realtime (OpenAI) Fallback 1: Gemini Live (Google) Fallback 2: Cascaded (GPT-4o + Azure TTS)

If the primary returns a 5xx or times out, Omniflow walks down the chain. Within a single conversation, it sticks with the current provider once 3+ turns have happened — switching mid-call hurts coherence.

See Reliability & Failover for the per-agent override.

Voice provider configuration

Each voice provider has its own credentials and quirks:

ProviderAuth
OpenAI RealtimeOpenAI API key with realtime access.
Gemini LiveGoogle Cloud project + Vertex AI access.
Cascaded TTSAPI keys for STT (Whisper / Deepgram) and TTS (Azure / AWS Polly / ElevenLabs).

Add credentials under Settings → Secrets (encrypted at rest) and reference them in the AI Infrastructure config.

Custom (BYO) models

For self-hosted models or third-party providers Omniflow doesn’t natively support:

  1. Configure an OpenAI-compatible endpoint URL.
  2. Provide auth (header or query param).
  3. Pick a “compatibility mode” — most providers ship an OpenAI-compatible API.
  4. Test with a sample conversation.

Custom models work with Standard mode; Realtime requires explicit support.

Cost controls

Workspace-level guardrails:

ControlWhat it does
Per-conversation token capHard limit on tokens per conversation.
Daily spend capPause AI on overrun.
Per-tier budgetsDifferent caps for free vs paid customers.
Routing to cheaper modelsAuto-route routine conversations to Haiku-class.

Don’t set hard daily caps without alerts. Hitting the cap mid-shift means agents stop working until the cap resets — pair caps with notifications so someone knows before it bites.

Health and incidents

The Health panel shows real-time provider status:

  • Uptime over last 24h.
  • p50 / p95 / p99 latency.
  • Tool error rate.
  • Failover trigger count.

Subscribe to a Slack channel for sustained-failure alerts. Provider incidents show up here before they affect your customers — most failures are caught and routed via failover before agents notice.

Open in Omniflow

If you want to…Go to
Tune voice per agentVoice Models
Configure failover per agentReliability & Failover
Read the runtime architectureVoice Runtime