Ch 10 — Production Deployment & Serving — Under the Hood

mergekit configs, GPTQ/AWQ/GGUF quantization, vLLM setup, Ollama Modelfiles, LoRA multi-tenant, and deployment scripts
Under the Hood
-
Click play or press Space to begin...
Step- / 10
AModel Merging with mergekitYAML configs for SLERP, TIES, and DARE
1
merge
SLERP Merge
2-model blend
or
2
merge
TIES / DARE
Multi-model merge
3
compressQuantize: GPTQ (GPU), AWQ (accuracy), GGUF (cross-platform)
BQuantization for DeploymentGPTQ, AWQ, and GGUF conversion scripts
3
compress
GPTQ / AWQ
GPU quantization
or
4
description
GGUF Export
llama.cpp format
CvLLM Production ServingServer setup, Docker, and multi-LoRA
5
dns
vLLM Server
Launch + config
multi
6
swap_horiz
Multi-LoRA
Hot-swap adapters
7
laptop_macOllama: import GGUF, create Modelfile, serve locally
DLocal Serving & ContainerizationOllama, Docker, and Kubernetes deployment
7
laptop_mac
Ollama Setup
Modelfile + serve
prod
8
inventory_2
Docker Deploy
Container + K8s
EMonitoring & Full PipelineHealth checks, metrics, and end-to-end deployment
9
monitoring
Monitoring
Metrics + alerts
full
10
checklist
Full Pipeline
Train to deploy
1
Title