Ch 10 — Privacy, Data Leakage & Model Extraction — Under the Hood
Training data extraction, membership inference, DP-SGD, Presidio, machine unlearning
Under the Hood
-
Click play or press Space to begin. Click any node for deep-dive details...
ATraining Data ExtractionOWASP LLM02:2025 — Carlini et al. 2023 divergence attack
1dataset
Memorized DataPII, code, secrets
in model weights
search
Divergence AttackRepeat tokens until
model diverges
2article
Extracted DataVerbatim training
samples recovered
3arrow_downward Membership inference & model extraction attacks
BMembership Inference & Model ExtractionShokri et al. 2017, Carlini ICML 2024
person_search
Membership InferenceWas this data
in training set?
4content_copy
Model ExtractionSteal model via
API queries
warning
Samsung IncidentApr 2023: source code
leaked to ChatGPT
5arrow_downward Defense: differential privacy & PII detection
CDifferential Privacy & PII DetectionDP-SGD, Microsoft Presidio, anonymization
blur_on
DP-SGDNoisy gradients
privacy guarantee
6badge
PresidioMicrosoft PII
detection & redaction
shuffle
AnonymizationReplace PII with
synthetic tokens
7arrow_downward Machine unlearning & right to erasure
DMachine Unlearning & Regulatory ComplianceGDPR right to erasure, EU AI Act, gradient subtraction
delete_sweep
UnlearningRemove specific data
from trained model
8gavel
GDPR / EU AI ActRight to erasure
7% revenue penalties
9arrow_downward Complete privacy defense architecture
EPrivacy Defense ArchitectureEnd-to-end privacy pipeline
monitoring
Output MonitoringDetect memorized
content in responses
10layers
Defense StackFull privacy
pipeline