$5 free credits when you sign up
Status

All systems operational — and we publish the proof.

NemoRouter runs on managed Cloud Run with multi-region failover, synthetic probes, and revision-scoped post-deploy log scans. Targeted uptime is 99.9% across every plan tier — the same SLA on Tier 1 as on Enterprise.

Status board

Every component, monitored independently

Failures rarely take down everything at once. Each surface is reported separately so partial degradations show up honestly — never papered over by a single green light.

Some systems are experiencing issues

As of May 19, 2026

Gateway

  • API proxy — US-Central (primary)

    OpenAI-compatible /v1 endpoint, in-process router

    Operational
  • Multi-region failover

    Automatic failover when the primary degrades

    Operational
  • Provider routing & fallback

    Routing strategies, fallback chains, retry policies

    Operational

Identity & Auth

  • Authentication

    Login, signup, SSO/SAML, session management

    Operational
  • RLS-scoped Supabase

    Tenant-isolated Postgres with Row-Level Security

    Operational
  • Virtual key validation

    Per-key auth, RPM/TPM enforcement, budget caps

    Operational

Billing & Credits

  • Credit ledger

    Reserve + settle, atomic balance mutations

    Operational
  • Stripe billing & webhooks

    Checkout, top-ups, subscription, webhook ingest

    Operational

Dashboard & Observability

  • Management dashboard

    Keys, teams, analytics, guardrails, settings

    Operational
  • Observability & logging

    Request logs, callbacks, alerts, latency metrics

    Operational

Provider routing

  • Google Vertex AI

    Gemini, Imagen, Veo, embeddings — live today

    Operational
  • Anthropic · OpenAI · AWS Bedrock

    Shipping next — not yet routing production traffic

    Under maintenance

Anthropic, OpenAI, and AWS Bedrock are marked “under maintenance” because they are shipping next — they do not yet route production traffic. See the model catalog for what is live today.

Uptime SLA
99.9%

Committed on every plan tier

API proxy availability
99.99%

Trailing 30 days, multi-region

Avg latency overhead
~8 ms

p95 cross-region, in-process router

Incidents this quarter
0

Customer-impacting; cosmetic excluded

Incident history

No open or recent customer-impacting incidents

Major incidents (P0 / P1) get a published postmortem within 7 days. When this section has entries, each will carry a timeline, root cause, and remediation.

0 incidents this quarter

As of May 19, 2026. Postmortems are published to the changelog.

How we monitor

Probes, gates, and alerts on every surface

Uptime is an output. The inputs are synthetic probes, deploy-time gates, and customer-configurable alerts that fire before a ticket gets filed.

Synthetic probes

Synthetic checks run against /v1/chat/completions and /health/readiness from every active region every 60 seconds. A miss alerts on-call before any customer ticket.

  • Hits the real proxy path, not a mock endpoint
  • Validates streaming + non-streaming separately
  • Probes a virtual key — same path as real customer traffic

Readiness gating

Every Cloud Run revision must pass a revision-scoped log scan before it serves traffic. A 200 on /health is necessary, not sufficient.

  • Boot logs scanned for DataError + 5xx for 60s
  • Prisma drift detection blocks promotion
  • Auto-rollback if the SLO window degrades

Customer alerts

Configure per-org alerts for budget burn, error spikes, latency regressions, or provider outages. Webhook, email, Slack, and Teams channels are all built in.

  • 8 alert types out of the box
  • Per-key + per-org thresholds
  • Slack / Teams / webhook delivery
Incident pipeline

From probe to rollback in minutes

When something breaks, the path from detection to mitigation runs through this pipeline — every step automated, every step audited.

Detection → response

  1. Synthetic probe

    every 60s

    Cross-region health checks

  2. Cloud Run revision

    :8090

    Readiness + revision-scoped log scan

  3. Alerting

    webhook · slack · email

    On-call paged on degradation

  4. Auto-rollback

    SLO breach

    Previous revision restored

Common questions

What this page covers — and what it does not

Is this a live-polled status feed?

Not yet — and we will not pretend otherwise. This board reflects the operational posture as of the snapshot date and changes with the deploy that resolves any incident. A live, server-backed feed is on the roadmap. We chose not to embed a third-party vendor widget (Statuspage / Better Stack) because it requires a separate account and a public page that drifts from our deploys.

How do you measure uptime?

Synthetic probes against the live API proxy from every active region every 60 seconds. Customer-impacting incidents — the kind that surface in real production traffic — count against the trailing 30-day window. Cosmetic and self-healing transient errors do not.

Where is the incident history?

Major incidents (P0 / P1) get a postmortem in the changelog within 7 days, with timeline, root cause, and remediation. As of the snapshot date there are no open or recent customer-impacting incidents.

Notice an issue?

Tell us before our probes do

If you are seeing routing errors or latency spikes that are not reflected here, our engineers want to know. Reports are triaged within an hour during business windows.

Security-sensitive reports — please email security@nemorouter.ai instead of filing a public issue.