Ch 2 — History of AI

From Turing’s 1950 paper through AI winters, expert systems, and the deep learning revolution
High Level
edit_note
1950s
arrow_forward
trending_up
1960s
arrow_forward
ac_unit
Winters
arrow_forward
local_hospital
1980s
arrow_forward
image
2012
arrow_forward
smart_toy
LLM Era
-
Click play or press Space to begin the journey...
Step- / 8
edit_note
The Foundations: 1943–1956
From mathematical neurons to the birth of a field
Key Milestones
1943 — McCulloch & Pitts publish “A Logical Calculus of the Ideas Immanent in Nervous Activity,” the first mathematical model of an artificial neuron. They prove networks of simple binary units can compute any logical function.

1950 — Alan Turing publishes “Computing Machinery and Intelligence,” proposing the Imitation Game (Turing Test) as a practical measure of machine intelligence.

1956 — Dartmouth Workshop formally founds AI as a field. McCarthy coins the term “Artificial Intelligence.”
// Timeline: The founding era 1943 McCulloch-Pitts neuron model 1950 Turing’s “Computing Machinery” paper 1955 Dartmouth proposal submitted 1956 Dartmouth Workshop (summer) 1957 Newell & Simon: General Problem Solver 1958 Rosenblatt: the Perceptron 1958 McCarthy: Lisp programming language 1959 Samuel coins “machine learning”
The optimism: Early researchers believed human-level AI was 10–20 years away. Herbert Simon predicted in 1957 that within ten years a computer would be chess champion and prove important mathematical theorems. The reality would take much longer.
trending_up
Early Optimism: 1957–1969
The golden age of symbolic AI and early neural networks
Breakthroughs
General Problem Solver (1957): Newell and Simon created a program that could solve logic puzzles using means-ends analysis — the first program to separate problem-solving strategy from domain knowledge.

ELIZA (1966): Joseph Weizenbaum’s chatbot at MIT simulated a Rogerian psychotherapist using simple pattern matching. Some users became emotionally attached, foreshadowing today’s debates about AI relationships.

Perceptron (1958): Rosenblatt’s learning machine could classify patterns by adjusting weights from examples — the first algorithm that learned from data.
The Mood
Government funding flowed freely. DARPA (then ARPA) invested heavily in AI research at MIT, Stanford, and Carnegie Mellon. Researchers made bold predictions about imminent breakthroughs in language understanding, vision, and general reasoning.
// ELIZA pattern matching (1966) // Weizenbaum, MIT User: “I am unhappy” ELIZA: “Do you think coming here will help you not to be unhappy?” // Rule: match “I am {X}” // Reply: “Do you think coming here // will help you not to be {X}?” // No understanding — pure pattern matching
ELIZA effect: People attributed understanding to ELIZA despite its simplicity. Weizenbaum was disturbed that his secretary asked him to leave the room so she could talk to ELIZA privately. This phenomenon persists with modern chatbots.
ac_unit
The First AI Winter: 1969–1980
Broken promises, funding cuts, and the perceptron crisis
What Went Wrong
1969 — Minsky & Papert publish Perceptrons, mathematically proving that single-layer perceptrons cannot solve XOR or any non-linearly-separable problem. This devastated neural network research for over a decade.

1973 — Lighthill Report (UK): Sir James Lighthill concluded that AI had failed to achieve its “grandiose objectives” and recommended drastic funding cuts. The British government largely withdrew AI funding.

1974 — DARPA cuts: The U.S. Congress pressured DARPA to fund only “mission-oriented” research, ending broad AI funding.
The Pattern
AI winters follow a predictable cycle: hype → overpromise → underdelivery → disillusionment → funding collapse. The gap between what researchers promised and what they delivered eroded trust. This pattern would repeat in the late 1980s and remains a cautionary tale today.
The Promise (1960s)
“Within 10 years, machines will do any work a man can do” — Herbert Simon, 1965
The Reality
Programs could solve toy problems but failed on real-world complexity. Translation, vision, and reasoning remained unsolved for decades.
local_hospital
Expert Systems Boom: 1980–1987
Rule-based AI becomes a billion-dollar industry
The Revival
AI rebounded through expert systems — programs that encoded human expert knowledge as if/then rules. Companies paid millions for systems that could diagnose diseases, configure computers, or analyze chemical compounds.

R1/XCON (1980): DEC’s computer configuration system saved an estimated $40 million per year. By 1985, companies were spending over $1 billion annually on expert systems.

Japan’s Fifth Generation (1982): A $400 million government initiative to build intelligent computers spurred competitive investment from the US and UK.
// Key expert systems DENDRAL (1965) Chemical analysis Stanford · first expert system MYCIN (1976) Bacterial infection diagnosis Stanford · ~600 rules · certainty factors R1/XCON (1980) DEC computer configuration Carnegie Mellon · saved $40M/year PROSPECTOR (1983) Mineral exploration SRI · discovered a molybdenum deposit
The hidden breakthrough: While expert systems dominated commercially, backpropagation was being rediscovered. Rumelhart, Hinton, and Williams published their landmark 1986 paper showing how to train multi-layer neural networks — solving the XOR problem that killed neural nets in 1969.
ac_unit
The Second AI Winter: 1987–1993
Expert systems collapse, but seeds of revival are planted
The Collapse
Expert systems proved brittle, expensive to maintain, and unable to learn. Every new scenario required manual rule authoring by scarce domain experts. The specialized Lisp machine market collapsed as cheaper general-purpose workstations caught up.

Japan’s Fifth Generation project ended in 1992 without achieving its goals. DARPA’s Strategic Computing Initiative was cancelled. The AI industry contracted sharply.
Seeds of Revival
Quietly, important work continued:
1989: Yann LeCun demonstrates CNNs for handwriting recognition (LeNet)
1992: Gerald Tesauro’s TD-Gammon masters backgammon via reinforcement learning
1997: IBM’s Deep Blue defeats Garry Kasparov in chess
// Why expert systems failed Problem 1: Knowledge bottleneck Every rule hand-coded by human experts Can’t learn from data or adapt Problem 2: Brittleness Works perfectly within its rules Fails catastrophically outside them Problem 3: Maintenance cost R1/XCON grew to 17,500 rules Became nearly impossible to update Problem 4: No common sense Couldn’t handle situations outside the narrow domain of its rules
Backpropagation survived: Despite the winter, Hinton, LeCun, and Bengio kept neural network research alive. LeCun’s 1989 LeNet could read handwritten zip codes — the first practical CNN. These researchers would later be called the “Godfathers of Deep Learning” and share the 2018 Turing Award.
image
Deep Learning Revolution: 2006–2015
GPUs, big data, and AlexNet change everything
The Catalysts
Three forces converged:

1. Data: The internet produced massive datasets. ImageNet (2009, Fei-Fei Li) provided 14 million labeled images across 20,000 categories — the fuel deep learning needed.

2. Compute: GPUs, originally designed for gaming, turned out to be ideal for the parallel matrix operations neural networks require. Training that took weeks on CPUs took hours on GPUs.

3. Algorithms: Hinton’s 2006 deep belief networks showed that deep networks could be trained effectively. ReLU activation, dropout, and batch normalization followed.
AlexNet: The Turning Point
In 2012, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton entered AlexNet in the ImageNet competition (ILSVRC). It achieved a top-5 error rate of 15.3% — crushing the second-place entry at 26.2%. The gap was so large it shocked the computer vision community. Deep learning had arrived.
// AlexNet (2012) — the numbers Parameters: 60 million Layers: 8 (5 conv + 3 fully-connected) Training: 1.2M ImageNet images Hardware: 2 NVIDIA GTX 580 GPUs Top-5 error: 15.3% vs 26.2% (2nd place) Innovation: ReLU, dropout, GPU training
smart_toy
The Transformer Era: 2017–2022
Attention is all you need — and then ChatGPT changes the world
The Transformer Revolution
2017: Google researchers publish “Attention Is All You Need” (Vaswani et al.), introducing the transformer architecture. By replacing recurrence with self-attention, transformers process entire sequences in parallel — enabling massive scaling.

2018: GPT-1 (117M parameters) and BERT demonstrate that pretraining on large text corpora produces powerful language representations.

2020: GPT-3 (175B parameters) shows that scaling transforms quantity into quality — the model can write code, translate, and reason without task-specific training.
// The scaling timeline 2017 Transformer paper (Google) 2018 GPT-1 117M params (OpenAI) 2018 BERT 340M params (Google) 2019 GPT-2 1.5B params (OpenAI) 2020 GPT-3 175B params (OpenAI) 2022 GPT-3.5 + RLHF alignment 2022 ChatGPT launched (Nov 30) 1M users in 5 days 100M users in 2 months
The ChatGPT moment: On November 30, 2022, OpenAI released ChatGPT publicly. It reached 1 million users in 5 days and 100 million in 2 months — the fastest-growing consumer application in history. AI went from a research topic to a mainstream technology overnight.
lightbulb
Lessons & Patterns
What 70+ years of AI history teach us
Recurring Patterns
1. Hype cycles are real: Every era of AI has been marked by inflated expectations followed by disappointment. The current excitement about LLMs is unprecedented, but the pattern warns us to be realistic.

2. Compute + data > clever algorithms: The biggest breakthroughs (AlexNet, GPT-3) came from scaling existing ideas with more data and compute, not from entirely new algorithms.

3. The “bitter lesson”: Rich Sutton’s 2019 essay argues that methods leveraging computation (search, learning) always eventually beat methods leveraging human knowledge (hand-coded rules).
The Full Timeline
1943 McCulloch-Pitts neuron 1950 Turing Test 1956 Dartmouth — AI is born 1958 Perceptron 1969 Perceptrons book → 1st winter 1980 Expert systems boom 1986 Backpropagation revival 1987 Expert systems bust → 2nd winter 1997 Deep Blue beats Kasparov 2012 AlexNet → deep learning era 2017 Transformers 2022 ChatGPT → mainstream AI
Are we in a bubble? AI investment exceeded $100B in 2024. History shows that hype can outpace reality. But unlike previous eras, today’s AI delivers measurable economic value at scale. The question isn’t whether AI works — it’s whether expectations match what it can actually do.