Ch 4 — LoRA & PEFT — Under the Hood

Low-rank decomposition, PEFT library internals, QLoRA NF4, DoRA, adapter merging code
Under the Hood
-
Click play or press Space to begin...
Step- / 10
ALoRA Forward Pass & ImplementationHow the adapter modifies the frozen weight matrix
1
input
Input x
Hidden state
frozen
1
lock
W\u2080x
Base output
+ add
2
add_circle
W\u2080x + BAx
Combined output
2
functionsAdapter path: x \u2192 A (r\u00d7d) \u2192 B (d\u00d7r) \u2192 scale by \u03b1/r \u2192 add to W\u2080x
BHuggingFace PEFT LibraryLoraConfig, get_peft_model, and trainable parameter inspection
3
settings
LoraConfig
r, alpha, targets
apply
3
build
get_peft_model
Inject adapters
inspect
4
analytics
Param Count
Trainable vs total
5
scienceRank ablation: how quality changes with r=4, 8, 16, 32, 64, 128
CQLoRA & NF4 Quantization4-bit base model loading with bitsandbytes
6
compress
NF4 Quant
4-bit base model
+ LoRA
6
tune
bf16 Adapters
Train in bf16
dequant
7
calculate
Compute
Dequant on-the-fly
DDoRA & Advanced TechniquesWeight decomposition, adapter stacking, and multi-adapter serving
8
call_split
DoRA
Magnitude + direction
stack
9
layers
Multi-Adapter
Stack or switch
EMerging, Saving & DeploymentFrom adapter to production model
10
merge
Merge
Adapter into base
save
10
save
Save
safetensors / GGUF
deploy
10
cloud
Serve
vLLM / Ollama
1
Title