AI Model Comparison
Compare AI models side by side — pricing, context windows, speed, and capabilities for GPT-4, Claude, Gemini, Llama, and more. Find the best LLM for your use case.
Showing 11 of 11 models
Pricing per 1M tokens
Input
$0.80
Output
$4
Context Window
200K tokens
Max Output
8.2K tokens
Best for
Fast responses, classification, extraction
Pricing per 1M tokens
Input
$15
Output
$75
Context Window
200K tokens
Max Output
32K tokens
Best for
Complex analysis, long documents, coding
Pricing per 1M tokens
Input
$3
Output
$15
Context Window
200K tokens
Max Output
16K tokens
Best for
Balanced performance, coding, writing
Pricing per 1M tokens
Input
$0.27
Output
$1.10
Context Window
131.1K tokens
Max Output
8.2K tokens
Best for
Coding, math, cost-effective reasoning
Pricing per 1M tokens
Input
$0.10
Output
$0.40
Context Window
1.0M tokens
Max Output
8.2K tokens
Best for
Ultra-fast, cost-effective, large context
Pricing per 1M tokens
Input
$1.25
Output
$10
Context Window
2.1M tokens
Max Output
8.2K tokens
Best for
Complex tasks, massive context windows
Pricing per 1M tokens
Input
$75
Output
$150
Context Window
128K tokens
Max Output
16.4K tokens
Best for
Research, complex reasoning
Pricing per 1M tokens
Input
$2.50
Output
$10
Context Window
128K tokens
Max Output
16.4K tokens
Best for
General purpose, coding, analysis
Pricing per 1M tokens
Input
$0.15
Output
$0.60
Context Window
128K tokens
Max Output
16.4K tokens
Best for
Cost-effective tasks, high volume
Pricing per 1M tokens
Input
$0.60
Output
$0.60
Context Window
131.1K tokens
Max Output
8.2K tokens
Best for
Self-hosted, privacy-sensitive, coding
Pricing per 1M tokens
Input
$2
Output
$6
Context Window
128K tokens
Max Output
8.2K tokens
Best for
European compliance, multilingual
Frequently Asked Questions
How do I choose the right AI model for my project?
Consider your priorities: if cost is key, look at models like GPT-4o mini, Gemini 2.0 Flash, or DeepSeek V3. For maximum quality, Claude Opus 4.6 or GPT-4.5 Preview are top choices. If you need large context windows for long documents, Gemini 2.0 Pro supports up to 2M tokens. For self-hosted or privacy-sensitive workloads, consider open-source models like Llama 3.3 70B or DeepSeek V3.
What does 'context window' mean for AI models?
The context window is the maximum amount of text (measured in tokens) that a model can process in a single request, including both your input and the model's output. A larger context window allows you to send longer documents, more conversation history, or bigger codebases. For example, Gemini 2.0 Pro supports 2M tokens, which is roughly equivalent to several full-length novels.
Why are AI model prices listed per million tokens?
Tokens are the basic units that language models use to process text. One token is roughly 3/4 of a word in English. Pricing per million tokens (1M tokens) is the industry standard because it makes it easier to compare costs across providers. For example, $2.50 per 1M input tokens means processing about 750,000 words of input costs $2.50.
What is the difference between input and output pricing?
Input pricing is what you pay for the text you send to the model (your prompt, instructions, context). Output pricing is what you pay for the text the model generates in response. Output tokens are typically more expensive because they require more computation. For cost optimization, keep your prompts concise and set reasonable max output lengths.
Should I use an open-source or proprietary AI model?
Open-source models like Llama 3.3 70B and DeepSeek V3 offer full control over your data, no per-token API costs when self-hosted, and the ability to fine-tune. However, they require infrastructure to run and may not match the quality of top proprietary models. Proprietary models like GPT-4o and Claude Opus 4.6 offer the highest quality with zero infrastructure overhead but come with per-token costs and data leaving your systems.