What happens to a live conversation if a provider degrades?

The fallback chain retries the next provider transparently. A 5xx, timeout, or circuit-break on the primary triggers the next link mid-conversation — the turn still completes and the caller does not hear an error.

Can I see latency for each turn of a voice conversation?

Yes. Every turn lands in the request log with the model, latency, token counts, and cost. Latency metrics let you watch p50 and p99 per model so you can spot a slow turn before it affects callers.

What happens to a live conversation if a provider degrades?

The fallback chain retries the next provider transparently. A 5xx, timeout, or circuit-break on the primary triggers the next link mid-conversation — the turn still completes and the caller does not hear an error.

Can I see latency for each turn of a voice conversation?

Yes. Every turn lands in the request log with the model, latency, token counts, and cost. Latency metrics let you watch p50 and p99 per model so you can spot a slow turn before it affects callers.

Does Nemo Router handle speech-to-text and text-to-speech?

Nemo Router is the LLM hop in a voice pipeline — the reasoning turn between transcription and synthesis. Your speech-to-text and text-to-speech layers stay yours; the gateway gives that LLM turn low-latency routing, failover, and observability.

Use Case · Voice & Realtime

Voice assistants that answer fast — every turn.

A voice app lives or dies on latency. Nemo Router steers each conversational turn to the quickest healthy endpoint, keeps the line alive with transparent failover, and logs every turn for observability.

Get started See the turn flow

voice-turn · conversation 4c1d

One turn of a live conversation

Route strategylatency-based

Endpoint chosenquickest healthy

Streamingtoken-by-token

Gateway overhead~95 ms p50

Mid-call failovertransparent

Turn loggedp50 / p99

low-latencyfailover-safeobservable

Gateway overhead: 95 ms
Each turn: Latency-routed
Mid-call failure: Failed over
Turn-level: Observable

Why Nemo for voice

Speed, resilience, and a turn-level trail

Voice is the least forgiving LLM workload — every turn is on a clock and a caller is listening. Nemo Router gives that turn fast routing, transparent failover, and observability.

Latency-based routing

A voice app lives or dies on response time — silence on the line is the failure mode. Latency-based routing steers each conversational turn to the quickest healthy endpoint using live signal.

Routing decisions add ~95 ms p50 — LLM time dominates
Latency-based and least-busy strategies for the hot path
Streaming proxied transparently to your speech layer
No hot-path buffering — tokens flow as they are generated

Failover mid-conversation

A provider blip during a live call cannot be a dead air moment. The fallback chain retries the next link transparently so the turn still completes and the caller hears a response, not an error.

Ordered fallback chain per model group
Timeouts, 5xx, and circuit-breaks trigger the next link
Cross-provider failover keeps a conversation alive
Each fallback logged without interrupting the call

Turn-level observability

When a call feels slow, you need the turn. Every turn lands in the request log with the model, latency, token counts, and cost — and latency metrics surface p50 and p99 per model.

Request log records model, latency, tokens, and cost per turn
p50 / p99 latency metrics per model to catch a slow tail
Export to Langfuse, Datadog, or S3 via a logging callback
Alerts fire on latency or error-rate thresholds

Model catalog for realtime

Realtime assistants want a fast, capable model — and the freedom to switch as faster models ship. The catalog exposes every model behind one key; pick the one with the latency profile you need.

Choose any catalog model per turn
Tag-filtered routing keeps function-calling turns on capable models
Swap models as the catalog grows — no SDK change
20+ models live, more shipping

How it works

A conversational turn, end to end

Nemo Router is the LLM hop between transcription and synthesis. The turn routes for latency, streams back token-by-token, and lands in the log with p50 / p99 metrics.

Voice turn flow

Caller speaks
speech-to-text upstream
Your speech layer transcribes the turn to text.
Turn request
POST /v1/chat/completions
The transcript becomes a streaming chat request.
Latency route
quickest healthy endpoint
Live latency signal picks the fastest model deployment.
Stream the reply
token-by-token
Tokens flow to your text-to-speech with no buffering.
Turn logged
latency metrics
Model, latency, cost per turn — p50 / p99 tracked.

Your speech-to-text and text-to-speech layers stay yours. Nemo Router gives the reasoning turn in between low-latency routing, failover, and a logged record.

Latency

The quickest healthy endpoint, every turn

Latency-based routing

Live signal picks the fastest deployment — turn by turn

A model deployment that was fastest a minute ago may not be now. Latency-based routing uses live latency signal to pick the quickest healthy endpoint for each turn. The gateway itself adds about 95 ms at p50 — the dominant factor is always LLM inference, never the proxy.

Latency-based and least-busy strategies for the hot path
Live signal — the choice updates as deployments speed up or slow
Streaming proxied with no hot-path buffering
Latency metrics surface p50 / p99 so a slow tail is visible

latency · per-turn routing

Turn-by-turn latency

Turn 1endpoint A · fastest

Turn 2endpoint C · fastest

Gateway p50~95 ms

LLM inferencedominant factor

Slow-tail alerton p99 threshold

live signalper-turnp50 / p99 tracked

The code

Stream the reasoning turn

A voice turn is a streaming chat request — transcript in, tokens out to your speech layer. These snippets come from the same SDK examples the playground uses; enable streaming and tokens flow as they generate.

Installpip install openai

1	`# Cache: enabled (org default). Pass nemo_cache: false to skip.`
2	`from openai import OpenAI`
3	`import os`
4
5	`client = OpenAI(`
6	`api_key=os.environ["NEMOROUTER_API_KEY"],`
7	`base_url="https://api.nemorouter.ai/v1",`
8	`)`
9
10	`response = client.chat.completions.create(`
11	`model="gemini-2.5-flash",`
12	`temperature=1,`
13	`max_tokens=1024,`
14	`top_p=1,`
15	`messages=[`
16	`{"role": "user", "content": "Hello! What models do you support?"},`
17	`],`
18	`extra_body={`
19	`# "nemo_cache": False, # Uncomment to skip cache`
20	`},`
21	`)`
22
23	`print(response.choices[0].message.content)`

Set stream: true — Nemo Router proxies the token stream with no hot-path buffering.

FAQ

Common voice & realtime questions

Fast turns, no dead air

Build a voice assistant that never leaves the caller waiting

Latency-based routing, transparent failover, and turn-level observability — all unlocked on every plan.

Get started How routing works

Nemo Router is the LLM hop — pair it with any speech-to-text and text-to-speech stack.