Key Milestones
1943: McCulloch-Pitts neuron model. 1958: Rosenblatt’s perceptron. 1969: Minsky & Papert’s XOR critique triggers AI Winter. 1986: Rumelhart, Hinton & Williams popularize backpropagation. 1989: Cybenko proves universal approximation. 1998: LeCun’s LeNet for digit recognition. 2010: Nair & Hinton introduce ReLU. 2012: AlexNet wins ImageNet, launching the deep learning era. Everything since builds on these foundations.
The connection: Every concept in this chapter — weighted sums, non-linear activations, stacked layers, learning from data — is present in today’s largest models. GPT-4 and Gemini are descendants of the perceptron, scaled by 12 orders of magnitude.
What's Next
We now know what neural networks are and why they work (universal approximation). The next chapter tackles how they learn: loss functions, backpropagation, and the computational graph that makes gradient-based training possible. This is the engine that turns a randomly initialized network into a useful model.
1969 — AI Winter
Single-layer perceptrons can’t learn XOR. No known way to train multi-layer networks. Funding collapses.
2012 — Deep Learning Era
Backpropagation + ReLU + GPUs + big data = AlexNet. Deep networks dominate vision, language, and beyond.