LLM Model Selection Calculator
Choose the right class of language model for your workload — focused on architecture, not fragile model rankings that change every week.
Workload Requirements
Use case
Latency target
Accuracy need
Privacy requirement
Deploy preference
Context size needed
Budget sensitivity
Additional requirements
Configure your requirements and click Select Model Class
Your recommended model class and deployment approach will appear here
Model Class Reference
Frontier / Flagship Models
Maximum capability, highest cost
Latency: MediumCost: High
GPT-4oClaude Opus 4
Mid-Tier Capable Models
Strong performance at lower cost
Latency: LowCost: Medium
GPT-4o miniClaude Sonnet 4
Small / Fast Models
Edge inference, real-time, high-volume
Latency: Very LowCost: Very Low
Llama 3.2 3BPhi-4 mini
Reasoning / Chain-of-Thought Models
Deep thinking for hard problems
Latency: HighCost: High
o3o4-mini
Embedding + Reranker Models
Retrieval backbone for RAG
Latency: Very LowCost: Very Low
text-embedding-3-largeCohere Embed v3
Fine-Tuned Specialist Models
Narrow-domain quality at lower cost
Latency: LowCost: Low
OpenAI fine-tuning (GPT-4o mini)LoRA-tuned Mistral 7B
Model Selection Principles
- Don't pick a specific model — pick a class. Specific model rankings change every few months. Choosing the right class (frontier, mid-tier, small, reasoning) is a decision that stays valid for 12–18 months.
- Latency is a hard constraint, not a preference. Real-time apps (≤500ms) rule out reasoning models and most frontier APIs. Design for the constraint first.
- Add a reranker before adding a bigger model. A reranker costs 10–50× less than upgrading from GPT-4o mini to GPT-4o but often gives a bigger accuracy boost on RAG tasks.
- Fine-tuning beats few-shot on narrow, high-volume tasks. If you have ≥1K labeled examples and the task is stable, fine-tuning a small model usually outperforms prompting a large one at 1/10th the cost.
- Privacy and cost constraints point to open-weight models. Llama 3, Mistral, Phi-4, and Gemma are serious alternatives to closed APIs for most tasks when running on private infrastructure.