What Changed
Neural networks existed for decades but remained a niche technique until three things converged around 2012:
1. Data — The internet produced massive labeled datasets (ImageNet: 14 million labeled images).
2. Compute — GPUs, originally designed for video games, turned out to be perfectly suited for the parallel math neural networks require.
3. Algorithmic improvements — Better activation functions (ReLU), regularization techniques (dropout), and initialization methods solved problems that had made deep networks untrainable.
The AlexNet Moment
In 2012, AlexNet — a deep neural network with 8 layers and 60 million parameters — won the ImageNet competition by a massive margin, cutting the error rate nearly in half compared to traditional methods. This was the “Sputnik moment” for deep learning. Within two years, every major tech company had pivoted to deep learning for image recognition, speech recognition, and natural language processing.
Automatic Feature Extraction
The most transformative capability of deep learning is automatic feature extraction. Traditional ML (Chapter 5) requires humans to manually engineer features — deciding which variables matter and how to represent them. Deep learning learns the features directly from raw data. Feed it raw pixels and it learns to detect edges, textures, shapes, and objects on its own. Feed it raw text and it learns grammar, semantics, and context.
Why it matters: Automatic feature extraction is why deep learning dominates unstructured data — images, audio, text, video. For structured/tabular data (spreadsheets, databases), traditional ML (XGBoost, Random Forest) still often wins because the features are already well-defined. Knowing which tool fits which data type is a critical strategic decision.