Ch 5 — Adversarial Machine Learning — Under the Hood
FGSM, PGD, C&W, GCG — gradient math, transferability, physical-world attacks, defenses
Under the Hood
-
Click play or press Space to begin. Click any node for deep-dive details...
AGradient-Based Attacks: FGSM & PGDGoodfellow et al. 2014, Madry et al. 2017
1image
Clean InputOriginal image
x ∈ [0,1]
functions
FGSMSingle-step
ε · sign(∇L)
2replay
PGDMulti-step iterative
α per step, project
broken_image
Adversarial x'Imperceptible
perturbation
3arrow_downward Advanced: optimization-based & LLM-specific attacks
BCarlini & Wagner (C&W) and GCG for LLMsOptimization-based attacks — L2 minimization & token-space gradients
target
C&W AttackMinimize L2 distance
100% success rate
4text_fields
GCG (Zou 2023)Token-level gradient
adversarial suffixes
auto_awesome
AutoDANGenetic algorithm
readable suffixes
5arrow_downward Transferability: white-box craft → black-box exploit
CTransferability & Physical-World AttacksCross-model transfer, adversarial patches, stop sign attacks
swap_horiz
TransferabilityVicuna → GPT-4
cross-model
6traffic
Physical AttacksStop sign patches
real-world evasion
visibility_off
Adversarial PatchPrintable sticker
fools classifiers
7arrow_downward Benchmarking: standardized evaluation of attacks & defenses
DBenchmarks & EvaluationAttackBench, RobustBench, AutoAttack
leaderboard
AttackBenchStandardized attack
comparison
8shield
RobustBenchDefense leaderboard
AutoAttack eval
speed
AutoAttackEnsemble of 4
attacks, parameter-free
9arrow_downward Defenses: adversarial training, certified robustness
EDefenses & RobustnessAdversarial training, randomized smoothing, defense stack
fitness_center
Adversarial TrainingPGD-AT (Madry)
train on attacks
blur_on
Certified RobustnessRandomized smoothing
provable guarantees
10layers
Defense StackLayered AML
mitigations