LLM Inference Cost Calculator

Estimate the daily and monthly cost of running an LLM workload in production — across providers, with prompt caching and multi-model comparison.

Inputs

Provider

Model

Input: $5.00/1M tokens · Output: $20.00/1M tokens · Cached: $1.25/1M

Input Tokens per Request

Average tokens in each prompt/context sent to the model

Output Tokens per Request

Average tokens in each model response

Requests per Day

Total API calls your application makes daily

Daily Active Users

Used to compute cost per user

Cache Hit Rate: 0%

0%45%90%

Percentage of input tokens served from prompt cache (reduces input cost).

Configure your workload and click Calculate to estimate production costs.

• Prompt caching is the single biggest lever — even 30% cache hit rate cuts input costs significantly.
• Use a smaller model for classification, routing, and simple generation steps.
• Trim system prompts aggressively — every 100 tokens saved × daily requests × 30 days adds up fast.
• Streaming does not reduce cost, but it improves perceived latency for end users.
• For agent workflows, multiply this estimate by your average number of LLM calls per task.