LLM Model Selection Calculator

Choose the right class of language model for your workload — focused on architecture, not fragile model rankings that change every week.

Workload Requirements

Use case

Latency target

Accuracy need

Privacy requirement

Deploy preference

Context size needed

Budget sensitivity

Additional requirements

Configure your requirements and click Select Model Class

Your recommended model class and deployment approach will appear here

Frontier / Flagship Models

Maximum capability, highest cost

Latency: MediumCost: High

GPT-4oClaude Opus 4

Mid-Tier Capable Models

Strong performance at lower cost

Latency: LowCost: Medium

GPT-4o miniClaude Sonnet 4

Small / Fast Models

Edge inference, real-time, high-volume

Latency: Very LowCost: Very Low

Llama 3.2 3BPhi-4 mini

Reasoning / Chain-of-Thought Models

Deep thinking for hard problems

Latency: HighCost: High

o3o4-mini

Embedding + Reranker Models

Retrieval backbone for RAG

Latency: Very LowCost: Very Low

text-embedding-3-largeCohere Embed v3

Fine-Tuned Specialist Models

Narrow-domain quality at lower cost

Latency: LowCost: Low

OpenAI fine-tuning (GPT-4o mini)LoRA-tuned Mistral 7B

Don't pick a specific model — pick a class. Specific model rankings change every few months. Choosing the right class (frontier, mid-tier, small, reasoning) is a decision that stays valid for 12–18 months.
Latency is a hard constraint, not a preference. Real-time apps (≤500ms) rule out reasoning models and most frontier APIs. Design for the constraint first.
Add a reranker before adding a bigger model. A reranker costs 10–50× less than upgrading from GPT-4o mini to GPT-4o but often gives a bigger accuracy boost on RAG tasks.
Fine-tuning beats few-shot on narrow, high-volume tasks. If you have ≥1K labeled examples and the task is stable, fine-tuning a small model usually outperforms prompting a large one at 1/10th the cost.
Privacy and cost constraints point to open-weight models. Llama 3, Mistral, Phi-4, and Gemma are serious alternatives to closed APIs for most tasks when running on private infrastructure.