Target Modules
Which layers get LoRA adapters? You choose which weight matrices to target. More targets = more capacity but more memory.
Minimal (attention only):
["q_proj", "v_proj"]
The original LoRA paper found Q and V most important. ~6.8M params for r=16.
Standard (all attention):
["q_proj", "k_proj", "v_proj", "o_proj"]
Most common configuration. ~13.6M params for r=16.
Full (attention + FFN):
["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Maximum capacity. ~40M params for r=16. Best quality but most memory.
from peft import LoraConfig
config = LoraConfig(
r=16, # rank
lora_alpha=32, # scaling (2x rank)
target_modules=[
"q_proj", "k_proj",
"v_proj", "o_proj",
],
lora_dropout=0.05, # regularization
bias="none", # don't train biases
task_type="CAUSAL_LM", # for decoder models
)
Dropout
lora_dropout: Applied to the adapter input during training. Acts as regularization to prevent overfitting. Common values: 0.0 (no dropout) to 0.1. Use 0.05 as a default. Increase to 0.1 if you see overfitting on small datasets.
Recommended starting config: r=16, alpha=32, target all attention modules (Q, K, V, O), dropout=0.05. This gives ~13.6M trainable params (0.17% of 8B) and fits on a single 24 GB GPU. Increase rank to 32 or 64 if quality needs improvement.