RAM = Model Size + Context + Overhead
Model size (Q4_K_M):
1B → ~0.8 GB
3B → ~2.0 GB
7B → ~3.8 GB
9B → ~5.5 GB
14B → ~8.5 GB
24B → ~14 GB
70B → ~40 GB
Context window overhead:
4K context: ~0.5 GB
8K context: ~1.0 GB
32K context: ~3.0 GB
128K context: ~10 GB
System overhead: ~0.5-1.0 GB
What Fits Where
8 GB RAM (MacBook Air M2)
✓ 3B Q4 + 8K context
✓ 7B Q4 + 4K context (tight)
✗ 9B anything
16 GB RAM (MacBook Pro M2/M3)
✓ 7B Q5 + 8K context
✓ 9B Q4 + 8K context
✓ 14B Q4 + 4K context (tight)
24 GB VRAM (RTX 4090)
✓ 14B Q5 + 8K context
✓ 24B Q4 + 4K context
32 GB RAM (Mac Studio M2 Pro)
✓ 24B Q5 + 8K context
✓ 14B Q8 + 32K context
Key insight: Context window size is a hidden RAM cost that catches people off guard. A 7B Q4 model is 3.8GB, but with 32K context it needs 7GB total. If you’re doing RAG with long documents, factor in context window RAM. For short tasks (classification, extraction), use a small context to save memory.