Consumer GPUs for Fine-Tuning
NVIDIA RTX 4090 (24 GB): Best consumer GPU for fine-tuning. QLoRA on 7B-13B models. Can handle 70B with aggressive quantization. ~$1,600.
NVIDIA RTX 3090 (24 GB): Previous gen but still capable. QLoRA on 7B models. No bf16 support (use fp16). ~$800 used.
NVIDIA RTX 4080 (16 GB): QLoRA on 7B models (tight). Not recommended for larger models.
Apple M-series (unified memory): MLX framework supports fine-tuning. M2 Ultra (192 GB) can fine-tune 70B models. Slower than NVIDIA but no VRAM limit. Good for experimentation.
Free Cloud Options
Google Colab (free): T4 GPU (16 GB). QLoRA on 7B models. Limited to ~12 hours per session. Good for learning.
Google Colab Pro ($10/mo): A100 40GB access. Much better for fine-tuning. Longer sessions.
Kaggle Notebooks (free): 2x T4 GPUs (16 GB each). 30 hours/week. Good for competitions and experimentation.
Lightning AI Studios (free tier): Free GPU access with persistent storage. Good for development.
The QLoRA revolution: Before QLoRA (2023), fine-tuning 7B models required 80+ GB of GPU memory. Now, QLoRA fits on a 16 GB consumer GPU. This democratized fine-tuning. You can prototype on Colab, iterate on a 4090, and scale to cloud A100s for production training.