Ch 5: Adversarial Machine Learning

Ch 5 — Adversarial Machine Learning — Under the Hood

FGSM, PGD, C&W, GCG — gradient math, transferability, physical-world attacks, defenses

Under the Hood

Click play or press Space to begin. Click any node for deep-dive details...

Step- / 10

AGradient-Based Attacks: FGSM & PGDGoodfellow et al. 2014, Madry et al. 2017

image

Clean InputOriginal image
x ∈ [0,1]

functions

FGSMSingle-step
ε · sign(∇L)

replay

PGDMulti-step iterative
α per step, project

broken_image

Adversarial x'Imperceptible
perturbation

arrow_downward Advanced: optimization-based & LLM-specific attacks

BCarlini & Wagner (C&W) and GCG for LLMsOptimization-based attacks — L₂ minimization & token-space gradients

target

C&W AttackMinimize L₂ distance
100% success rate

text_fields

GCG (Zou 2023)Token-level gradient
adversarial suffixes

auto_awesome

AutoDANGenetic algorithm
readable suffixes

arrow_downward Transferability: white-box craft → black-box exploit

CTransferability & Physical-World AttacksCross-model transfer, adversarial patches, stop sign attacks

swap_horiz

TransferabilityVicuna → GPT-4
cross-model

traffic

Physical AttacksStop sign patches
real-world evasion

visibility_off

Adversarial PatchPrintable sticker
fools classifiers

arrow_downward Benchmarking: standardized evaluation of attacks & defenses

DBenchmarks & EvaluationAttackBench, RobustBench, AutoAttack

leaderboard

AttackBenchStandardized attack
comparison

shield

RobustBenchDefense leaderboard
AutoAttack eval

speed

AutoAttackEnsemble of 4
attacks, parameter-free

arrow_downward Defenses: adversarial training, certified robustness

EDefenses & RobustnessAdversarial training, randomized smoothing, defense stack

fitness_center

Adversarial TrainingPGD-AT (Madry)
train on attacks

blur_on

Certified RobustnessRandomized smoothing
provable guarantees

layers

Defense StackLayered AML
mitigations