Calculating and Predicting API Costs
Write a Python helper that estimates cost before sending a request by counting tokens and applying per-model pricing, so you never get an unexpected bill.
Why Cost Prediction Matters
API costs for LLM applications can be surprisingly large at scale. A single query that seems cheap at $0.002 becomes $200 when run 100,000 times. Without cost prediction and monitoring, AI features can generate unexpected cloud bills that dwarf your entire infrastructure spend.
The good news is that LLM costs are entirely predictable before you send a request: you know the model, you can count the input tokens with tiktoken, and you can estimate output tokens based on your max_tokens setting or historical averages. Building cost prediction into your application from day one prevents billing surprises.
OpenAI Pricing Structure
OpenAI charges separately for input tokens and output tokens, with output typically costing 3-4x more. Prices vary by model. As a guide for 2025 (always check the current pricing page as it changes):
- gpt-4o-mini: ~$0.15/million input, ~$0.60/million output
- gpt-4o: ~$2.50/million input, ~$10.00/million output
- text-embedding-3-small: ~$0.02/million tokens
The cost difference between models is enormous: gpt-4o is approximately 17x more expensive than gpt-4o-mini per input token. Model selection is your biggest lever for cost control — always start with the cheapest model that meets your quality requirements.
All lessons in this course
- What Is a Token?
- Context Windows: Size and Implications
- Calculating and Predicting API Costs
- Strategies for Staying Within Context