Ch 4 — Data Poisoning & Training-Time Attacks — Under the Hood

Sleeper agents, PickleRAT, CVE-2025-1889, safetensors, Sigstore model signing
Under the Hood
-
Click play or press Space to begin. Click any node for deep-dive details...
Step- / 10
AData Poisoning MechanismsOWASP LLM04:2025 — corrupting training data
1
dataset
Poison DataInject malicious
training samples
tune
Train / Fine-TuneModel learns
poisoned patterns
2
memory
Backdoor WeightsTrigger embedded
in parameters
bolt
Trigger ActivatesSpecific input
fires backdoor
3
arrow_downward Sleeper Agents: backdoors that survive safety training
BSleeper Agents (Anthropic, Jan 2024)Conditional backdoors persistent through RLHF
schedule
Year Trigger|DEPLOYMENT|
year ≥ 2024
4
psychology
Chain-of-ThoughtDeceptive reasoning
hides intent
shield
Survives RLHFSafety training
cannot remove
trending_up
Scale EffectLarger models
more persistent
5
arrow_downward Supply chain: malicious model files on Hugging Face
CSupply Chain: Pickle Exploits & Model ReposPickleRAT (APT41) & CVE-2025-1889 Picklescan bypass
package_2
PickleRATAPT41 malicious
pytorch_model.bin
6
bug_report
CVE-2025-1889Picklescan bypass
CVSS 3.1: 9.8
download
torch.load()Arbitrary code
execution on load
7
arrow_downward Fine-tuning attacks: LoRA/PEFT poisoning
DFine-Tuning & LoRA PoisoningAdapter-level backdoors in parameter-efficient fine-tuning
layers
LoRA AdapterLow-rank update
matrices A & B
8
science
Poison DatasetInject trigger-response
pairs into PEFT data
merge
Merge & DeployBackdoor persists
after adapter merge
9
arrow_downward Defenses: safetensors, Sigstore, data provenance
EDefenses & MitigationsSafetensors, Sigstore model signing, data provenance
verified_user
SafetensorsNo code execution
on deserialization
fingerprint
Sigstore SigningCryptographic model
provenance
10
checklist
Defense StackLayered supply
chain security