Ch 3 — Machine Learning Paradigms

Supervised, unsupervised, and reinforcement learning — the three ways machines learn from data
High Level
school
Supervised
arrow_forward
category
Classify
arrow_forward
search
Unsupervised
arrow_forward
hub
Cluster
arrow_forward
sports_esports
RL
arrow_forward
auto_awesome
Beyond
-
Click play or press Space to begin the journey...
Step- / 8
school
Three Ways Machines Learn
The fundamental paradigms of machine learning
The Core Question
Machine learning is the science of getting computers to learn from data without being explicitly programmed. But how a system learns depends on what kind of feedback it receives. This gives us three fundamental paradigms, each suited to different problems.
Supervised “Learn from labeled examples” Input: data + correct answers Goal: predict answers for new data Unsupervised “Find hidden structure” Input: data only (no labels) Goal: discover patterns & groups Reinforcement “Learn by trial and reward” Input: environment + reward signal Goal: maximize cumulative reward
Analogy
Supervised: A teacher shows you flashcards with questions and answers. You learn the mapping and can answer new questions.

Unsupervised: You’re given a pile of photos with no labels. You naturally group them by similarity — landscapes, portraits, animals.

Reinforcement: You learn to ride a bike. Nobody tells you the “correct” action at each moment — you try things, fall, adjust, and gradually improve through feedback.
Most modern AI uses supervised learning (or its variants). Image classifiers, spam filters, language models, and recommendation systems all learn from labeled or structured data. But the lines are blurring — LLMs use self-supervised pretraining + RLHF.
category
Supervised Learning: Classification
Predicting which category something belongs to
How It Works
Given a dataset of input features (X) and correct labels (y), the algorithm learns a function f(X) → y that maps inputs to categories. Once trained, it can classify new, unseen inputs. The model learns by minimizing the difference between its predictions and the true labels.
Real-World Examples
Email spam detection: Features = word frequencies, sender, links. Labels = spam/not spam.
Medical diagnosis: Features = symptoms, test results. Labels = disease/healthy.
Image recognition: Features = pixel values. Labels = cat/dog/bird.
Fraud detection: Features = transaction data. Labels = fraudulent/legitimate.
# Classification algorithms Logistic Regression Linear boundary, probability output Fast, interpretable, good baseline Decision Trees / Random Forests Series of if/then splits Handles mixed data types well Support Vector Machines (SVM) Finds optimal separating hyperplane Effective in high dimensions Neural Networks Learns complex non-linear boundaries Dominates when data is abundant k-Nearest Neighbors (k-NN) Classifies by majority vote of neighbors Simple but slow at scale
The bias-variance tradeoff: Simple models (logistic regression) may underfit complex data. Complex models (deep neural nets) may overfit to noise. The art of ML is finding the right balance for your data and problem.
trending_up
Supervised Learning: Regression
Predicting continuous numerical values
Classification vs Regression
Classification predicts categories (spam/not spam). Regression predicts numbers (house price = $450,000). Same supervised framework — learn from labeled examples — but the output is continuous rather than discrete.
Real-World Examples
House price prediction: Features = sq ft, bedrooms, location. Output = price.
Stock forecasting: Features = historical prices, volume. Output = future price.
Weather prediction: Features = temperature, pressure, humidity. Output = tomorrow’s temperature.
Ad revenue estimation: Features = clicks, impressions. Output = revenue.
# Linear regression — the simplest model y = w⋅x + b # Example: predict house price price = 200 × sqft + 50000 × bedrooms + 30000 × garage - 10000 # The model learns weights (w) and bias (b) # by minimizing prediction error on # training data (labeled examples). # Loss function: Mean Squared Error MSE = (1/n) × ∑(y_pred - y_true)²
Beyond linear: Real relationships are rarely linear. Polynomial regression, decision tree regressors, and neural networks can model complex non-linear relationships. Deep learning excels when the mapping from input to output is highly complex (e.g., image → age estimation).
hub
Unsupervised Learning: Clustering
Discovering natural groups in unlabeled data
No Labels, No Problem
Unsupervised learning works with unlabeled data — no correct answers are provided. The algorithm must discover structure on its own. Clustering groups similar data points together, revealing natural categories the data contains.
Real-World Examples
Customer segmentation: Group customers by purchasing behavior to target marketing.
Document clustering: Organize news articles by topic without predefined categories.
Anomaly detection: Find unusual patterns that don’t fit any cluster (fraud, network intrusion).
Gene expression: Group genes with similar expression patterns to discover biological functions.
# K-Means clustering 1. Choose K (number of clusters) 2. Randomly place K centroids 3. Assign each point to nearest centroid 4. Recalculate centroids as cluster means 5. Repeat 3-4 until convergence # Other clustering algorithms Hierarchical Build tree of nested clusters DBSCAN Density-based, finds arbitrary shapes Gaussian Mix Probabilistic, soft assignments
The K problem: K-Means requires you to specify the number of clusters in advance. Choosing the wrong K gives meaningless results. Techniques like the elbow method and silhouette analysis help, but cluster count is often a judgment call.
compress
Unsupervised: Dimensionality Reduction
Compressing high-dimensional data while preserving structure
The Curse of Dimensionality
Real-world data often has hundreds or thousands of features (dimensions). High-dimensional data is hard to visualize, slow to process, and prone to overfitting. Dimensionality reduction compresses data into fewer dimensions while preserving the most important information.
Key Techniques
PCA (Principal Component Analysis): Finds the directions of maximum variance and projects data onto them. Reduces 1000 features to 50 while keeping 95% of the information.

t-SNE / UMAP: Non-linear methods that preserve local structure. Excellent for visualizing high-dimensional data in 2D or 3D. Widely used to visualize embeddings and clusters.
# PCA — the idea Original data: 1000 features per sample After PCA: 50 features per sample Info retained: ~95% of variance # How it works: 1. Find direction of maximum variance (PC1) 2. Find next orthogonal direction (PC2) 3. Repeat for K components 4. Project data onto top-K components # Use cases: Visualization (reduce to 2D/3D) Preprocessing (speed up downstream ML) Noise removal (drop low-variance dims)
Autoencoders are a neural network approach to dimensionality reduction. They learn to compress data into a small “bottleneck” layer and reconstruct it. The bottleneck representation captures the most important features. Variational autoencoders (VAEs) extend this for generative modeling (Ch 11).
sports_esports
Reinforcement Learning
Learning by trial, error, and reward
The Agent-Environment Loop
An agent observes the current state of an environment, takes an action, receives a reward (positive or negative), and transitions to a new state. The goal: learn a policy (strategy) that maximizes cumulative reward over time. No labeled examples — the agent discovers what works through exploration.
Real-World Examples
Game playing: AlphaGo, Atari DQN, OpenAI Five (Dota 2)
Robotics: Learning to walk, grasp objects, navigate
Recommendation: Optimizing long-term user engagement
LLM alignment: RLHF trains models to be helpful and safe (Ch 12)
# The RL loop while not done: state = observe(environment) action = policy(state) # choose action reward = environment.step(action) update policy based on reward # The exploration-exploitation dilemma Explore: Try new actions (might discover better) Exploit: Use best known action (safe but limited) # Balance via epsilon-greedy: # With probability ε, explore randomly # Otherwise, take the best known action
Delayed rewards: In chess, you don’t know if a move was good until many moves later. RL must assign “credit” to past actions that contributed to eventual success or failure. This credit assignment problem is one of the hardest challenges in RL.
auto_awesome
Beyond the Big Three
Self-supervised, semi-supervised, and transfer learning
Self-Supervised Learning
The model creates its own labels from the data. GPT-style pretraining masks the next word and predicts it. BERT masks random words and fills them in. Contrastive learning (SimCLR, CLIP) learns by comparing augmented views of the same data. This is how modern foundation models are trained — no human labeling required for pretraining.
Semi-Supervised Learning
Uses a small amount of labeled data plus a large amount of unlabeled data. The model learns general patterns from unlabeled data and refines with labels. Practical when labeling is expensive (medical imaging, satellite photos).
Transfer Learning
Pretrain on a large general dataset, then fine-tune on a small task-specific dataset. ImageNet-pretrained CNNs transfer to medical imaging. GPT pretrained on internet text transfers to customer support. This is the dominant paradigm in modern AI — almost no one trains from scratch anymore.
# The modern recipe 1. Self-supervised pretrain Train on massive unlabeled data (GPT: predict next token) 2. Supervised fine-tune Adapt to specific task with labels (instruction tuning) 3. RLHF alignment Reinforce helpful, safe behavior (reward model from human prefs)
This is how ChatGPT was built: Self-supervised pretraining (predict next token on internet text) + supervised fine-tuning (instruction following) + RLHF (align with human preferences). All three paradigms in one system.
decision
Choosing the Right Paradigm
A practical decision framework
Decision Guide
Do you have labeled data? YesSupervised learning Predict category? → Classification Predict number? → Regression NoUnsupervised learning Find groups? → Clustering Reduce features? → Dim. reduction Find outliers? → Anomaly detection Is there an environment with rewards? YesReinforcement learning Sequential decisions with feedback Have massive unlabeled data? YesSelf-supervised pretrain Then fine-tune with small labeled set
Key Takeaways
1. Supervised learning is the workhorse — most production ML is supervised

2. Unsupervised learning discovers hidden structure without labels

3. RL learns from interaction and delayed rewards

4. Self-supervised learning powers modern foundation models

5. Transfer learning means you rarely train from scratch

6. Modern systems combine multiple paradigms (pretrain + fine-tune + RLHF)
Coming up: Ch 4 covers how to prepare data for these paradigms. Ch 5–6 dive into the neural network mechanics that power supervised learning. Ch 12 goes deep on reinforcement learning and RLHF.