A single API giving developers access to 500+ AI models from 60+ providers. One key, one endpoint. Gives you access to everything but guides you on nothing. That’s what this project is about.

Product Design · Concept · OpenRouter

OpenRouter has 500 models.
I still couldn't pick one.

So I designed a wizard that does the picking. Four questions, one recommendation, copy-paste code to ship.

Design exploration based on observed patterns, not validated user research. I'm being upfront about that because it matters.

Try the prototype ↓Read the case study

The challenge: Take a developer from “I need AI” to a working API call without requiring them to understand 503 model IDs, pricing tables, or fallback infrastructure.

Live prototype

RoleProduct Designer

CompanyOpenRouter (concept)

Timeline8 days

ToolsFigma · Claude · HTML/CSS/JS

Highlights

503 model IDs, zero guidance on where to start
Pricing opaque until you get the bill
Production breaks when model IDs change

Approach

4-question deterministic recommendation engine
3 ranked options with plain-language reasoning
Pre-configured code output with fallback chain

Result

503 → 1 models to a recommendation
Predicted cost shown before you commit
Copy-paste code on your first visit

01 · The Story

Meet Casey

👩‍💻

Casey

27 years old

Full-Stack Engineer, Series A startup

Austin, TX

Building a side project or MVPShipping AI features at a startup

The situation

PM dropped a Jira ticket: “Add AI chat to the support flow. Sprint ends Friday.”
Heard about OpenRouter: one API key, every model, automatic failover.
Landed on openrouter.ai/models. 503 models stared back.
Picked GPT-4o based on a Reddit thread. Cost $47 in week one.

Frustrated

😤

503 model IDs with no guidance. Pricing that's opaque until the bill arrives. Model IDs that change silently and break production at 2am.

Touchpoints

🔍

Reddit threads 8 months old. Hacker News comments. Trustpilot reviews warning about surprise costs. GitHub issues about deprecated model IDs.

Goals

🎯

Ship the AI feature before Friday. Pick the right model without becoming an LLM expert. Not get surprised by the bill next month.

Motivation

⚡

Wants to build fast and build right. Trusts tools that explain their reasoning. Will pay for quality if the cost is predictable upfront.

Casey is a full-stack developer at a 40-person startup. Last Tuesday her PM dropped a Jira ticket: “Add AI chat to the support flow. Sprint ends Friday.” Not ML, not infra. Just Casey, a deadline, and a blank file.

She'd heard about OpenRouter. One key, every model, automatic failover. She went to openrouter.ai/models.

503 models.

She didn't know if she needed Claude or GPT-4o or Llama. She didn't know what a context window meant for a support bot running roughly 200 conversations a day. She didn't know if $3.50 per million tokens was cheap or ruinous at that volume.

She opened Reddit. Found a thread from 8 months ago. Someone said “just use GPT-4o.” She used GPT-4o. Cost her $47 in the first week. Her PM was not pleased.

Three weeks later, the model ID she'd hardcoded changed without warning. Feature broke in production. She found out at 2am from an error alert, not from OpenRouter.

OpenRouter had the right answer the whole time. It just wasn't findable.

“The bottleneck isn't the model. It's knowing which model.”

Hacker NewsChoice paralysis

“Just grab the top ~30 models on OpenRouter and test them all.”

news.ycombinator.com ↗

TrustpilotUnexpected cost

“Costs 100 times higher than expected. Nearly $50 for less than 100 lines of output.”

trustpilot.com ↗

TrustpilotIDs change silently

“OpenRouter keeps changing the model ID names, causing errors.”

trustpilot.com ↗

Real quotes from public sources. 39 OpenRouter models were silently deprecated in a single LiteLLM update ↗, breaking live integrations with no warning.

02 · The Problem

A catalog isn't a recommendation

OpenRouter's own State of AI report says programming is 40-60% of all paid-model traffic on the platform. The typical OpenRouter user is a developer building something with AI. And most of them are not ML engineers. They're Casey.

OpenRouter's models page is technically complete. Every model is there. The pricing is accurate. The filters exist. But it's designed for someone who already knows what they want.

Casey doesn't know what she wants. She knows she needs “AI for a customer support bot.” The gap between “I need AI” and “I need anthropic/claude-haiku-4-5-20251001 with temperature 0.3 and these fallbacks configured” is enormous. Nothing bridges it.

What about Auto Router?

OpenRouter already has an Auto Router. It picks the cheapest provider that meets a quality threshold at runtime. That's an infrastructure decision. It happens after you've already chosen a model category. It doesn't tell Casey what to build with, or why, or what her bill will look like next month. Model Match is for that earlier moment: before there's any code at all.

openrouter.ai/models

Problem 1503 model IDs. No way to know where to start.

Models503 models

⌕Search models by name, provider, capability…

Problem 2Filters don't map to user intent. “What should I use for a support bot?” has no answer here.

AllFreeTextVisionJSONFunction calling128K+OpenAIAnthropicGoogleMeta

🤖google/gemini-2.5-pro1M ctx$3.50/M inPaidNew

🧠anthropic/claude-sonnet-4-6200K ctx$3.00/M inPaid

⚡meta-llama/llama-3.3-70b-instruct128K ctxFree tierFree

🔮mistralai/mistral-large-2407128K ctx$2.00/M inPaid

🌐openai/gpt-4o-mini128K ctx$0.15/M inPaid

Problem 3Model IDs change without warning. No cost prediction before you commit.

Where Model Match fits

Casey lands on OpenRouter

→

Sees 503 models

→

GAP: no guidance← Model Match goes here

→

Narrows to 3 options

→

Compares

→

Picks one

→

Code output

→

First API call

03 · Design Decisions

Four choices that shaped everything

What I ruled out first

Filter presets

One click for “indie dev prototyping,” one for “production API.” Rejected because it still requires the user to self-identify correctly, which has the same intent-mapping problem as the current filter UI.

AI-powered recommendations

Let a model pick a model. Rejected because it's a black box. You can't explain the reasoning, you can't audit it. Deterministic is worse on edge cases but far better at building trust.

Side-by-side comparison table

Show the top 10 models with specs. Rejected because it reproduces the original problem in a smaller box. If Casey could evaluate a comparison table, he wouldn't need this tool.

Four questions, not two or eight

What I did

I landed on four questions: use case, quality vs. speed, monthly volume, prompt length. Not three. Not six. Four.

Why

Two questions don't give enough signal. Six starts feeling like a form. Four covers the variables that actually change which model I'd recommend.

What I'd do differently

I'd want to validate these four against real user decision patterns. Maybe volume matters less than I think. Maybe there's a fifth variable: whether they need structured JSON output.

A recommendation, not a comparison

What I did

Model Match doesn't show you a side-by-side table. It picks one and tells you why.

Why

Casey doesn't know enough yet to evaluate a comparison. The recommendation has to be opinionated. "Use this one" is more useful than "here are your options."

What I'd do differently

The current "Compare" button on each card is a bit of a cop-out. If someone clicks Compare, that means my recommendation didn't land. I'd want to understand why.

Code output as part of the design

What I did

The last step isn't "here's your model." It's a pre-configured code block in Node.js, Python, or curl. Ready to copy.

Why

The gap between "I picked a model" and "I'm making API calls" is where most people drop off. Removing that friction is a design decision. Design owns the full journey, not just the screens.

What I'd do differently

Right now the code is static. In a real product, the temperature, max_tokens, and system prompt would all be tuned to your use case.

Making the fallback chain visible

What I did

Under the recommendations, there's a fallback chain: Primary → Fallback 1 → Fallback 2.

Why

Failover is OpenRouter's killer feature. But almost nobody knows it exists because it's invisible infrastructure. Making it visible builds trust and explains why OpenRouter is more than a proxy.

What I'd do differently

The current fallback chain is static. In a real product, I'd show real-time provider uptime next to each fallback option.

04 · The Prototype

Try it yourself

This is what I actually built. Click through it. The recommendations are deterministic (not AI), but the model IDs, pricing, and code are real.

model-match.vercel.app

Find your model in 4 questions

Answer what you know. We'll handle the 503 IDs you don't.

1Use Case

2Quality

3Volume

4Prompt Length

Answer all 4 questions to continue

Your top 3 models

Ranked for your answers. Each one comes with a reason, not just a spec sheet.

Best match

Claude Haiku 4.5

anthropic/claude-haiku-4-5-20251001

Cost$0.80/M

Latency0.8s

Context200K

Est. monthly~$4

Why this model

Haiku 4.5 is fast enough for live chat and handles multi-turn conversations well. At 10K requests a month you're looking at under $5 total.

Fastest

Llama 3.3 70B

meta-llama/llama-3.3-70b-instruct

CostFree tier

Latency0.4s

Context128K

Est. monthly$0

Why this model

Free tier covers your volume: about 333 requests a day. Best choice if you're still validating whether you need AI at all.

Budget pick

GPT-4o Mini

openai/gpt-4o-mini

Cost$0.15/M

Latency0.6s

Context128K

Est. monthly~$0.80

Why this model

Cheapest option if you scale past free tier. Strong on structured JSON output, good if your support bot needs to classify tickets or fill forms.

Your fallback chain ⓘ

Primaryclaude-haiku-4-5

→

Fallback 1gpt-4o-mini

→

Fallback 2llama-3.3-70b

If Claude Haiku is down, OpenRouter automatically falls back to GPT-4o Mini, then Llama. Most developers don't know this exists until their primary provider goes down at 2am.

Ready to copy: Claude Haiku 4.5

Pre-configured for a support bot. Change the model ID line to swap models. Nothing else changes.

Cost projection

Monthly requests10,000

Avg tokens per request~450 tokens

Input cost ($0.80/M)$2.40

Output cost ($4.00/M)$1.60

Total monthly estimate~$4.00/month

GPT-4o (not matched)~~~$25/month~~

How OpenRouter actually works

One API call. Automatic failover. Cost optimization. Most people use it for months without realizing all of this is happening.

Your App

sends a request

→

OpenRouter

routes + optimizes

→

Anthropic

Claude Haiku 4.5

→

Your User

gets a response

When Anthropic goes down

Anthropic

Down

↓ auto-fallback

OpenAI GPT-4o Mini

Active

→

Your users see no error

request completes normally

This is what happened to Casey at 2am. Except he wasn't using OpenRouter yet.

05 · Reflection

What I'd do with 6 more weeks

This is a concept. I built it in 8 days. Here's what I know is rough, and what I'd need to actually ship it.

The recommendation engine is a decision tree, not AI. I made that choice deliberately. Deterministic systems are easier to audit. But it means the recommendations are only as good as my assumptions about how use cases map to models.

The uncomfortable question I haven't answered

Is a wizard even the right form? Wizards work when the problem space is stable. LLM capability and pricing change every few weeks. A static decision tree becomes stale fast. It could be actively misleading within a month if a better cheap model ships or a recommended one gets deprecated. I don't have a good answer to that yet. It might mean the right product is a recommendation layer that pulls live data from the OpenRouter API, not a hardcoded picker. Or it might mean the whole premise is wrong and the real fix is better documentation. I don't know. That's worth saying out loud.

Primary metric

What I'd measure: Time from landing on OpenRouter to first successful API call. I'd guess it goes from 2–3 hours to under 15 minutes. But that's a guess.

Usability test

What I'd do: Sit with 5 developers who've never used OpenRouter. Watch them use the picker. The four questions feel right to me, but I designed them. That's not the same as them being right.

A/B test

What I'd test: Wizard flow vs. current models page on new user cohorts. Measure time-to-first-API-call and 7-day retention.

OpenRouter has 500 models.I still couldn't pick one.

Meet Casey

A catalog isn't a recommendation

Four choices that shaped everything

Four questions, not two or eight

A recommendation, not a comparison

Code output as part of the design

Making the fallback chain visible

Try it yourself

Find your model in 4 questions

Your top 3 models

Ready to copy: Claude Haiku 4.5

Cost projection

How OpenRouter actually works

What I'd do with 6 more weeks

OpenRouter has 500 models.
I still couldn't pick one.