Sanjana Gangishetty
WHAT IS OPENROUTER?
A single API giving developers access to 500+ AI models from 60+ providers. One key, one endpoint. Gives you access to everything but guides you on nothing. That’s what this project is about.
← Back to explorations

Product Design · Concept · OpenRouter

OpenRouter has 500 models.
I still couldn't pick one.

So I designed a wizard that does the picking. Four questions, one recommendation, copy-paste code to ship.

Design exploration based on observed patterns, not validated user research. I'm being upfront about that because it matters.

The challenge: Take a developer from “I need AI” to a working API call without requiring them to understand 503 model IDs, pricing tables, or fallback infrastructure.

Live prototype
RoleProduct Designer
CompanyOpenRouter (concept)
Timeline8 days
ToolsFigma · Claude · HTML/CSS/JS
Highlights
  • 503 model IDs, zero guidance on where to start
  • Pricing opaque until you get the bill
  • Production breaks when model IDs change
Approach
  • 4-question deterministic recommendation engine
  • 3 ranked options with plain-language reasoning
  • Pre-configured code output with fallback chain
Result
  • 503 → 1 models to a recommendation
  • Predicted cost shown before you commit
  • Copy-paste code on your first visit

01 · The Story

Meet Casey

👩‍💻
Casey
27 years old
Full-Stack Engineer, Series A startup
Austin, TX
Building a side project or MVPShipping AI features at a startup
The situation
  • PM dropped a Jira ticket: “Add AI chat to the support flow. Sprint ends Friday.”
  • Heard about OpenRouter: one API key, every model, automatic failover.
  • Landed on openrouter.ai/models. 503 models stared back.
  • Picked GPT-4o based on a Reddit thread. Cost $47 in week one.
Frustrated
😤
503 model IDs with no guidance. Pricing that's opaque until the bill arrives. Model IDs that change silently and break production at 2am.
Touchpoints
🔍
Reddit threads 8 months old. Hacker News comments. Trustpilot reviews warning about surprise costs. GitHub issues about deprecated model IDs.
Goals
🎯
Ship the AI feature before Friday. Pick the right model without becoming an LLM expert. Not get surprised by the bill next month.
Motivation
Wants to build fast and build right. Trusts tools that explain their reasoning. Will pay for quality if the cost is predictable upfront.

Casey is a full-stack developer at a 40-person startup. Last Tuesday her PM dropped a Jira ticket: “Add AI chat to the support flow. Sprint ends Friday.” Not ML, not infra. Just Casey, a deadline, and a blank file.

She'd heard about OpenRouter. One key, every model, automatic failover. She went to openrouter.ai/models.

503 models.

She didn't know if she needed Claude or GPT-4o or Llama. She didn't know what a context window meant for a support bot running roughly 200 conversations a day. She didn't know if $3.50 per million tokens was cheap or ruinous at that volume.

She opened Reddit. Found a thread from 8 months ago. Someone said “just use GPT-4o.” She used GPT-4o. Cost her $47 in the first week. Her PM was not pleased.

Three weeks later, the model ID she'd hardcoded changed without warning. Feature broke in production. She found out at 2am from an error alert, not from OpenRouter.

OpenRouter had the right answer the whole time. It just wasn't findable.

“The bottleneck isn't the model. It's knowing which model.”
Hacker NewsChoice paralysis

“Just grab the top ~30 models on OpenRouter and test them all.”

news.ycombinator.com ↗
TrustpilotUnexpected cost

“Costs 100 times higher than expected. Nearly $50 for less than 100 lines of output.”

trustpilot.com ↗
TrustpilotIDs change silently

“OpenRouter keeps changing the model ID names, causing errors.”

trustpilot.com ↗

Real quotes from public sources. 39 OpenRouter models were silently deprecated in a single LiteLLM update ↗, breaking live integrations with no warning.

02 · The Problem

A catalog isn't a recommendation

OpenRouter's own State of AI report says programming is 40-60% of all paid-model traffic on the platform. The typical OpenRouter user is a developer building something with AI. And most of them are not ML engineers. They're Casey.

OpenRouter's models page is technically complete. Every model is there. The pricing is accurate. The filters exist. But it's designed for someone who already knows what they want.

Casey doesn't know what she wants. She knows she needs “AI for a customer support bot.” The gap between “I need AI” and “I need anthropic/claude-haiku-4-5-20251001 with temperature 0.3 and these fallbacks configured” is enormous. Nothing bridges it.

What about Auto Router?

OpenRouter already has an Auto Router. It picks the cheapest provider that meets a quality threshold at runtime. That's an infrastructure decision. It happens after you've already chosen a model category. It doesn't tell Casey what to build with, or why, or what her bill will look like next month. Model Match is for that earlier moment: before there's any code at all.

openrouter.ai/models
Problem 1503 model IDs. No way to know where to start.
Models503 models
Problem 2Filters don't map to user intent. “What should I use for a support bot?” has no answer here.
AllFreeTextVisionJSONFunction calling128K+OpenAIAnthropicGoogleMeta
🤖google/gemini-2.5-pro1M ctx$3.50/M inPaidNew
🧠anthropic/claude-sonnet-4-6200K ctx$3.00/M inPaid
meta-llama/llama-3.3-70b-instruct128K ctxFree tierFree
🔮mistralai/mistral-large-2407128K ctx$2.00/M inPaid
🌐openai/gpt-4o-mini128K ctx$0.15/M inPaid
Problem 3Model IDs change without warning. No cost prediction before you commit.
Where Model Match fits
Casey lands on OpenRouter
Sees 503 models
GAP: no guidance← Model Match goes here
Narrows to 3 options
Compares
Picks one
Code output
First API call

03 · Design Decisions

Four choices that shaped everything

What I ruled out first

Filter presets

One click for “indie dev prototyping,” one for “production API.” Rejected because it still requires the user to self-identify correctly, which has the same intent-mapping problem as the current filter UI.

AI-powered recommendations

Let a model pick a model. Rejected because it's a black box. You can't explain the reasoning, you can't audit it. Deterministic is worse on edge cases but far better at building trust.

Side-by-side comparison table

Show the top 10 models with specs. Rejected because it reproduces the original problem in a smaller box. If Casey could evaluate a comparison table, he wouldn't need this tool.

01

Four questions, not two or eight

What I did

I landed on four questions: use case, quality vs. speed, monthly volume, prompt length. Not three. Not six. Four.

Why

Two questions don't give enough signal. Six starts feeling like a form. Four covers the variables that actually change which model I'd recommend.

What I'd do differently

I'd want to validate these four against real user decision patterns. Maybe volume matters less than I think. Maybe there's a fifth variable: whether they need structured JSON output.

02

A recommendation, not a comparison

What I did

Model Match doesn't show you a side-by-side table. It picks one and tells you why.

Why

Casey doesn't know enough yet to evaluate a comparison. The recommendation has to be opinionated. "Use this one" is more useful than "here are your options."

What I'd do differently

The current "Compare" button on each card is a bit of a cop-out. If someone clicks Compare, that means my recommendation didn't land. I'd want to understand why.

03

Code output as part of the design

What I did

The last step isn't "here's your model." It's a pre-configured code block in Node.js, Python, or curl. Ready to copy.

Why

The gap between "I picked a model" and "I'm making API calls" is where most people drop off. Removing that friction is a design decision. Design owns the full journey, not just the screens.

What I'd do differently

Right now the code is static. In a real product, the temperature, max_tokens, and system prompt would all be tuned to your use case.

04

Making the fallback chain visible

What I did

Under the recommendations, there's a fallback chain: Primary → Fallback 1 → Fallback 2.

Why

Failover is OpenRouter's killer feature. But almost nobody knows it exists because it's invisible infrastructure. Making it visible builds trust and explains why OpenRouter is more than a proxy.

What I'd do differently

The current fallback chain is static. In a real product, I'd show real-time provider uptime next to each fallback option.

04 · The Prototype

Try it yourself

This is what I actually built. Click through it. The recommendations are deterministic (not AI), but the model IDs, pricing, and code are real.

model-match.vercel.app

Find your model in 4 questions

Answer what you know. We'll handle the 503 IDs you don't.

1Use Case
2Quality
3Volume
4Prompt Length
Answer all 4 questions to continue

Your top 3 models

Ranked for your answers. Each one comes with a reason, not just a spec sheet.

Best match
Claude Haiku 4.5
anthropic/claude-haiku-4-5-20251001
Cost$0.80/M
Latency0.8s
Context200K
Est. monthly~$4
Why this model

Haiku 4.5 is fast enough for live chat and handles multi-turn conversations well. At 10K requests a month you're looking at under $5 total.

Fastest
Llama 3.3 70B
meta-llama/llama-3.3-70b-instruct
CostFree tier
Latency0.4s
Context128K
Est. monthly$0
Why this model

Free tier covers your volume: about 333 requests a day. Best choice if you're still validating whether you need AI at all.

Budget pick
GPT-4o Mini
openai/gpt-4o-mini
Cost$0.15/M
Latency0.6s
Context128K
Est. monthly~$0.80
Why this model

Cheapest option if you scale past free tier. Strong on structured JSON output, good if your support bot needs to classify tickets or fill forms.

Your fallback chain
Primaryclaude-haiku-4-5
Fallback 1gpt-4o-mini
Fallback 2llama-3.3-70b

If Claude Haiku is down, OpenRouter automatically falls back to GPT-4o Mini, then Llama. Most developers don't know this exists until their primary provider goes down at 2am.

Ready to copy: Claude Haiku 4.5

Pre-configured for a support bot. Change the model ID line to swap models. Nothing else changes.

Cost projection

Monthly requests10,000
Avg tokens per request~450 tokens
Input cost ($0.80/M)$2.40
Output cost ($4.00/M)$1.60
Total monthly estimate~$4.00/month
GPT-4o (not matched)~$25/month

How OpenRouter actually works

One API call. Automatic failover. Cost optimization. Most people use it for months without realizing all of this is happening.

Your App
sends a request
OpenRouter
routes + optimizes
Anthropic
Claude Haiku 4.5
Your User
gets a response
When Anthropic goes down
Anthropic
Down
↓ auto-fallback
OpenAI GPT-4o Mini
Active
Your users see no error
request completes normally

This is what happened to Casey at 2am. Except he wasn't using OpenRouter yet.

05 · Reflection

What I'd do with 6 more weeks

This is a concept. I built it in 8 days. Here's what I know is rough, and what I'd need to actually ship it.

The recommendation engine is a decision tree, not AI. I made that choice deliberately. Deterministic systems are easier to audit. But it means the recommendations are only as good as my assumptions about how use cases map to models.

The uncomfortable question I haven't answered

Is a wizard even the right form? Wizards work when the problem space is stable. LLM capability and pricing change every few weeks. A static decision tree becomes stale fast. It could be actively misleading within a month if a better cheap model ships or a recommended one gets deprecated. I don't have a good answer to that yet. It might mean the right product is a recommendation layer that pulls live data from the OpenRouter API, not a hardcoded picker. Or it might mean the whole premise is wrong and the real fix is better documentation. I don't know. That's worth saying out loud.

01
Primary metric

What I'd measure: Time from landing on OpenRouter to first successful API call. I'd guess it goes from 2–3 hours to under 15 minutes. But that's a guess.

02
Usability test

What I'd do: Sit with 5 developers who've never used OpenRouter. Watch them use the picker. The four questions feel right to me, but I designed them. That's not the same as them being right.

03
A/B test

What I'd test: Wizard flow vs. current models page on new user cohorts. Measure time-to-first-API-call and 7-day retention.