The Analogy
Imagine a master chef (pre-trained model) who knows 10,000 recipes. To specialize in Italian food, you don’t retrain from scratch. You give them a thin notebook of Italian adjustments: “add more basil here, less cream there.” LoRA does exactly this — it freezes the original weights and adds a tiny low-rank “adjustment notebook” (matrices A and B where rank r « d).
Key insight: LoRA decomposes the weight update ΔW into two thin matrices: ΔW = B × A, where B is (d×r) and A is (r×d). With d = 4096 and r = 16, you train 131,072 parameters instead of 16,777,216 — a 128× reduction.
Worked Example
import torch
# Original weight matrix (frozen)
d = 4096
W = torch.randn(d, d) # 16.7M params
# LoRA: low-rank update ΔW = B × A
r = 16 # rank (tiny!)
A = torch.randn(r, d) # 65,536 params
B = torch.randn(d, r) # 65,536 params
# Total trainable: 131,072 (0.78% of W!)
# Forward pass with LoRA
x = torch.randn(d)
y = (W + B @ A) @ x # original + adjustment
Source: Hu et al. (2021) “LoRA: Low-Rank Adaptation of Large Language Models” showed that fine-tuning updates have low intrinsic rank, enabling efficient adaptation with r as small as 4–16.