Ch 4: LoRA & PEFT — Under the Hood

Ch 4 — LoRA & PEFT — Under the Hood

Low-rank decomposition, PEFT library internals, QLoRA NF4, DoRA, adapter merging code

Index ← High Level

Under the Hood

Click play or press Space to begin...

Step- / 10

ALoRA Forward Pass & ImplementationHow the adapter modifies the frozen weight matrix

input

Input x

Hidden state

frozen

lock

W\u2080x

Base output

+ add

add_circle

W\u2080x + BAx

Combined output

functionsAdapter path: x \u2192 A (r\u00d7d) \u2192 B (d\u00d7r) \u2192 scale by \u03b1/r \u2192 add to W\u2080x

BHuggingFace PEFT LibraryLoraConfig, get_peft_model, and trainable parameter inspection

settings

LoraConfig

r, alpha, targets

apply

build

get_peft_model

Inject adapters

inspect

analytics

Param Count

Trainable vs total

scienceRank ablation: how quality changes with r=4, 8, 16, 32, 64, 128

CQLoRA & NF4 Quantization4-bit base model loading with bitsandbytes

compress

NF4 Quant

4-bit base model

+ LoRA

tune

bf16 Adapters

Train in bf16

dequant

calculate

Compute

Dequant on-the-fly

DDoRA & Advanced TechniquesWeight decomposition, adapter stacking, and multi-adapter serving

call_split

DoRA

Magnitude + direction

stack

layers

Multi-Adapter

Stack or switch

EMerging, Saving & DeploymentFrom adapter to production model

merge

Merge

Adapter into base

save

Save

safetensors / GGUF

deploy

cloud

Serve

vLLM / Ollama