You shouldn't need to read a provider's release notes to know which model is fastest today. v1.7 ships the model catalog — every model we route to, on one searchable, sortable page.

What's on the page

/{org}/models lists every model registered in the router right now. For each:

Provider (Google Vertex AI, OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, …)
Mode (chat / embedding / image / video / audio / rerank / OCR)
Context window + max output tokens
Price per 1M input / output tokens (cache-discounted variants shown separately)
Live p50 / p95 latency measured from real traffic — not synthetic benchmarks
Quality rank for chat models, scored against a fixed eval set

Sort by any column, filter by capability (vision, code, long-context, structured-output), search by name. Click any model to see the JSON it returns from /v1/models so your SDK lookup matches the dashboard.

ROUTER_STATS as source of truth

The catalog reads from a generated ROUTER_STATS constant — same constant the landing page hero uses to render "18 models live". When we add a model, we update one place and every surface reflects it automatically. No hardcoded "18" anywhere. (We've audited this — see feedback_litellm_live_default.md.)

Switching models

You don't need new code to try a new model. Your SDK call uses the model field:

# Was:
response = client.chat.completions.create(model="gemini-2.5-flash", ...)

# Now try:
response = client.chat.completions.create(model="gemini-2.5-pro", ...)

Same key, same endpoint, same billing flow. The catalog tells you what's available; your code chooses.

Public listing

A public, unauthenticated copy lives at /models — useful for procurement reviews, RFPs, and the "do you support X model" question that lands in every sales call.