Commands
ollama pull <model> Download model
ollama run <model> Chat interactively
ollama list Show downloaded
ollama ps Show running
ollama rm <model> Delete model
ollama show <model> Model details
ollama cp <src> <dst> Copy model
ollama create <name> -f Modelfile
ollama serve Start server
API Endpoints
POST /api/generate Completion
POST /api/chat Chat (multi-turn)
POST /api/embeddings Embeddings
GET /api/tags List models
POST /api/show Model info
POST /api/pull Pull model
DELETE /api/delete Delete model
GET /api/ps Running models
Recommended First Models
8GB RAM:
ollama run llama3.2 # 3B, 2GB
ollama run gemma2:2b # 2B, 1.6GB
16GB RAM:
ollama run qwen2.5:7b # 7B, 4.4GB
ollama run phi4-mini # 3.8B, 2.5GB
24GB+ RAM:
ollama run mistral-small # 24B, 14GB
ollama run qwen2.5:14b # 14B, 9GB
Key insight: Ollama is your daily driver for local AI. Install it, pull a model, and you have a private, free, fast AI assistant running on your machine. For most users, this is all you need. Chapter 6 covers llama.cpp for when you need more control — custom quantization, server tuning, or building from source.