Ch 2: History of AI — High Level

Ch 2 — History of AI

From Turing’s 1950 paper through AI winters, expert systems, and the deep learning revolution

Index Under the Hood →

High Level

edit_note

1950s

arrow_forward

trending_up

1960s

arrow_forward

ac_unit

Winters

arrow_forward

local_hospital

1980s

arrow_forward

image

2012

arrow_forward

smart_toy

LLM Era

Click play or press Space to begin the journey...

Step- / 8

edit_note

The Foundations: 1943–1956

From mathematical neurons to the birth of a field

Key Milestones

1943 — McCulloch & Pitts publish “A Logical Calculus of the Ideas Immanent in Nervous Activity,” the first mathematical model of an artificial neuron. They prove networks of simple binary units can compute any logical function.

1950 — Alan Turing publishes “Computing Machinery and Intelligence,” proposing the Imitation Game (Turing Test) as a practical measure of machine intelligence.

1956 — Dartmouth Workshop formally founds AI as a field. McCarthy coins the term “Artificial Intelligence.”

// Timeline: The founding era 1943 McCulloch-Pitts neuron model 1950 Turing’s “Computing Machinery” paper 1955 Dartmouth proposal submitted 1956 Dartmouth Workshop (summer) 1957 Newell & Simon: General Problem Solver 1958 Rosenblatt: the Perceptron 1958 McCarthy: Lisp programming language 1959 Samuel coins “machine learning”

The optimism: Early researchers believed human-level AI was 10–20 years away. Herbert Simon predicted in 1957 that within ten years a computer would be chess champion and prove important mathematical theorems. The reality would take much longer.

trending_up

Early Optimism: 1957–1969

The golden age of symbolic AI and early neural networks

Breakthroughs

General Problem Solver (1957): Newell and Simon created a program that could solve logic puzzles using means-ends analysis — the first program to separate problem-solving strategy from domain knowledge.

ELIZA (1966): Joseph Weizenbaum’s chatbot at MIT simulated a Rogerian psychotherapist using simple pattern matching. Some users became emotionally attached, foreshadowing today’s debates about AI relationships.

Perceptron (1958): Rosenblatt’s learning machine could classify patterns by adjusting weights from examples — the first algorithm that learned from data.

The Mood

Government funding flowed freely. DARPA (then ARPA) invested heavily in AI research at MIT, Stanford, and Carnegie Mellon. Researchers made bold predictions about imminent breakthroughs in language understanding, vision, and general reasoning.

// ELIZA pattern matching (1966) // Weizenbaum, MIT User: “I am unhappy” ELIZA: “Do you think coming here will help you not to be unhappy?” // Rule: match “I am {X}” // Reply: “Do you think coming here // will help you not to be {X}?” // No understanding — pure pattern matching

ELIZA effect: People attributed understanding to ELIZA despite its simplicity. Weizenbaum was disturbed that his secretary asked him to leave the room so she could talk to ELIZA privately. This phenomenon persists with modern chatbots.

ac_unit

The First AI Winter: 1969–1980

Broken promises, funding cuts, and the perceptron crisis

What Went Wrong

1969 — Minsky & Papert publish Perceptrons, mathematically proving that single-layer perceptrons cannot solve XOR or any non-linearly-separable problem. This devastated neural network research for over a decade.

1973 — Lighthill Report (UK): Sir James Lighthill concluded that AI had failed to achieve its “grandiose objectives” and recommended drastic funding cuts. The British government largely withdrew AI funding.

1974 — DARPA cuts: The U.S. Congress pressured DARPA to fund only “mission-oriented” research, ending broad AI funding.

The Pattern

AI winters follow a predictable cycle: hype → overpromise → underdelivery → disillusionment → funding collapse. The gap between what researchers promised and what they delivered eroded trust. This pattern would repeat in the late 1980s and remains a cautionary tale today.

The Promise (1960s)

“Within 10 years, machines will do any work a man can do” — Herbert Simon, 1965

The Reality

Programs could solve toy problems but failed on real-world complexity. Translation, vision, and reasoning remained unsolved for decades.

local_hospital

Expert Systems Boom: 1980–1987

Rule-based AI becomes a billion-dollar industry

The Revival

AI rebounded through expert systems — programs that encoded human expert knowledge as if/then rules. Companies paid millions for systems that could diagnose diseases, configure computers, or analyze chemical compounds.

R1/XCON (1980): DEC’s computer configuration system saved an estimated $40 million per year. By 1985, companies were spending over $1 billion annually on expert systems.

Japan’s Fifth Generation (1982): A $400 million government initiative to build intelligent computers spurred competitive investment from the US and UK.

// Key expert systems DENDRAL (1965) Chemical analysis Stanford · first expert system MYCIN (1976) Bacterial infection diagnosis Stanford · ~600 rules · certainty factors R1/XCON (1980) DEC computer configuration Carnegie Mellon · saved $40M/year PROSPECTOR (1983) Mineral exploration SRI · discovered a molybdenum deposit

The hidden breakthrough: While expert systems dominated commercially, backpropagation was being rediscovered. Rumelhart, Hinton, and Williams published their landmark 1986 paper showing how to train multi-layer neural networks — solving the XOR problem that killed neural nets in 1969.

ac_unit

The Second AI Winter: 1987–1993

Expert systems collapse, but seeds of revival are planted

The Collapse

Expert systems proved brittle, expensive to maintain, and unable to learn. Every new scenario required manual rule authoring by scarce domain experts. The specialized Lisp machine market collapsed as cheaper general-purpose workstations caught up.

Japan’s Fifth Generation project ended in 1992 without achieving its goals. DARPA’s Strategic Computing Initiative was cancelled. The AI industry contracted sharply.

Seeds of Revival

Quietly, important work continued:
1989: Yann LeCun demonstrates CNNs for handwriting recognition (LeNet)
1992: Gerald Tesauro’s TD-Gammon masters backgammon via reinforcement learning
1997: IBM’s Deep Blue defeats Garry Kasparov in chess

// Why expert systems failed Problem 1: Knowledge bottleneck Every rule hand-coded by human experts Can’t learn from data or adapt Problem 2: Brittleness Works perfectly within its rules Fails catastrophically outside them Problem 3: Maintenance cost R1/XCON grew to 17,500 rules Became nearly impossible to update Problem 4: No common sense Couldn’t handle situations outside the narrow domain of its rules

Backpropagation survived: Despite the winter, Hinton, LeCun, and Bengio kept neural network research alive. LeCun’s 1989 LeNet could read handwritten zip codes — the first practical CNN. These researchers would later be called the “Godfathers of Deep Learning” and share the 2018 Turing Award.

image

Deep Learning Revolution: 2006–2015

GPUs, big data, and AlexNet change everything

The Catalysts

Three forces converged:

1. Data: The internet produced massive datasets. ImageNet (2009, Fei-Fei Li) provided 14 million labeled images across 20,000 categories — the fuel deep learning needed.

2. Compute: GPUs, originally designed for gaming, turned out to be ideal for the parallel matrix operations neural networks require. Training that took weeks on CPUs took hours on GPUs.

3. Algorithms: Hinton’s 2006 deep belief networks showed that deep networks could be trained effectively. ReLU activation, dropout, and batch normalization followed.

AlexNet: The Turning Point

In 2012, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton entered AlexNet in the ImageNet competition (ILSVRC). It achieved a top-5 error rate of 15.3% — crushing the second-place entry at 26.2%. The gap was so large it shocked the computer vision community. Deep learning had arrived.

// AlexNet (2012) — the numbers Parameters: 60 million Layers: 8 (5 conv + 3 fully-connected) Training: 1.2M ImageNet images Hardware: 2 NVIDIA GTX 580 GPUs Top-5 error: 15.3% vs 26.2% (2nd place) Innovation: ReLU, dropout, GPU training

smart_toy

The Transformer Era: 2017–2022

Attention is all you need — and then ChatGPT changes the world

The Transformer Revolution

2017: Google researchers publish “Attention Is All You Need” (Vaswani et al.), introducing the transformer architecture. By replacing recurrence with self-attention, transformers process entire sequences in parallel — enabling massive scaling.

2018: GPT-1 (117M parameters) and BERT demonstrate that pretraining on large text corpora produces powerful language representations.

2020: GPT-3 (175B parameters) shows that scaling transforms quantity into quality — the model can write code, translate, and reason without task-specific training.

// The scaling timeline 2017 Transformer paper (Google) 2018 GPT-1 117M params (OpenAI) 2018 BERT 340M params (Google) 2019 GPT-2 1.5B params (OpenAI) 2020 GPT-3 175B params (OpenAI) 2022 GPT-3.5 + RLHF alignment 2022 ChatGPT launched (Nov 30) 1M users in 5 days 100M users in 2 months

The ChatGPT moment: On November 30, 2022, OpenAI released ChatGPT publicly. It reached 1 million users in 5 days and 100 million in 2 months — the fastest-growing consumer application in history. AI went from a research topic to a mainstream technology overnight.

lightbulb

Lessons & Patterns

What 70+ years of AI history teach us

Recurring Patterns

1. Hype cycles are real: Every era of AI has been marked by inflated expectations followed by disappointment. The current excitement about LLMs is unprecedented, but the pattern warns us to be realistic.

2. Compute + data > clever algorithms: The biggest breakthroughs (AlexNet, GPT-3) came from scaling existing ideas with more data and compute, not from entirely new algorithms.

3. The “bitter lesson”: Rich Sutton’s 2019 essay argues that methods leveraging computation (search, learning) always eventually beat methods leveraging human knowledge (hand-coded rules).

The Full Timeline

1943 McCulloch-Pitts neuron 1950 Turing Test 1956 Dartmouth — AI is born 1958 Perceptron 1969 Perceptrons book → 1st winter 1980 Expert systems boom 1986 Backpropagation revival 1987 Expert systems bust → 2nd winter 1997 Deep Blue beats Kasparov 2012 AlexNet → deep learning era 2017 Transformers 2022 ChatGPT → mainstream AI

Are we in a bubble? AI investment exceeded $100B in 2024. History shows that hype can outpace reality. But unlike previous eras, today’s AI delivers measurable economic value at scale. The question isn’t whether AI works — it’s whether expectations match what it can actually do.