Why Augmentation Works
Data augmentation applies random transformations to training images (flips, rotations, crops, color jitter) to artificially increase dataset size and diversity. A dataset of 10K images with augmentation can behave like 100K+ images. This is one of the most effective regularization techniques — it directly addresses the root cause of overfitting (insufficient data diversity). Modern augmentation strategies like RandAugment (Cubuk et al., 2020) and Mixup (Zhang et al., 2018) have become standard.
Common Augmentations
// PyTorch augmentation pipeline
from torchvision import transforms
train_transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(
brightness=0.4, contrast=0.4,
saturation=0.4, hue=0.1
),
transforms.RandAugment(),
transforms.ToTensor(),
transforms.Normalize(
mean=[.485,.456,.406],
std=[.229,.224,.225]
),
])
Rule of thumb: Always use augmentation for image tasks. For NLP, augmentation is harder but includes back-translation, synonym replacement, and random deletion. For LLM pretraining, the massive dataset size makes traditional augmentation unnecessary.