Ch 3 — How Machines Learn

The core mental model for training, models, and predictions — no math required
High Level
database
Data
arrow_forward
cleaning_services
Prepare
arrow_forward
model_training
Train
arrow_forward
assessment
Evaluate
arrow_forward
rocket_launch
Deploy
arrow_forward
monitoring
Monitor
-
Click play or press Space to begin...
Step- / 8
swap_horiz
The Fundamental Shift
From programming rules to learning from data
Traditional Software
In traditional software, a developer writes explicit rules: “If the transaction amount exceeds $10,000 and the location is different from the last three transactions, flag it.” The programmer anticipates every scenario and codes the logic. The system does exactly what it’s told — nothing more, nothing less.
Machine Learning
In machine learning, you provide data and outcomes, and the system figures out the rules itself. You show it millions of transactions labeled “fraud” or “legitimate,” and it identifies patterns that distinguish the two — patterns that may be too subtle or too complex for a human to articulate. The rules emerge from the data, not from a programmer’s assumptions.
Traditional Approach
Input: Rules + Data
Output: Answers

The programmer defines the logic. The system executes it. Rigid, predictable, limited to what was anticipated.
ML Approach
Input: Data + Answers
Output: Rules (the model)

The system discovers the logic. The programmer defines the goal. Flexible, adaptive, can find patterns humans miss.
Key insight: This inversion — from “human writes rules” to “machine discovers rules” — is the conceptual core of machine learning. Everything else in this chapter builds on this idea.
database
Data: The Raw Material
What goes in determines what comes out
What Counts as Data
ML systems learn from examples. These can be structured (spreadsheets, databases, transaction logs) or unstructured (emails, images, audio, documents). A fraud detection model learns from historical transactions. A language model learns from text scraped from the internet. A medical imaging system learns from thousands of labeled X-rays.
Labels and Features
Features are the characteristics the model uses to make predictions — transaction amount, time of day, merchant category. Labels are the correct answers — “fraud” or “legitimate.” In supervised learning (the most common type), you need both. The quality and relevance of your features often matters more than the sophistication of the algorithm.
The Data Quality Problem
Most enterprise AI projects spend 60–80% of their time on data preparation — cleaning, formatting, deduplicating, and labeling. Raw data is messy: missing values, inconsistent formats, duplicate records, outdated entries. A model trained on poor data will produce poor results, regardless of how advanced the algorithm is.
Why it matters: When an AI project fails, the cause is rarely the algorithm. It’s almost always the data — not enough of it, wrong kind of it, or poor quality. This is the single most important factor executives should scrutinize in any AI initiative. Chapter 4 goes deeper.
model_training
Training: How the Machine Learns
The iterative process of getting better at predictions
The Process
Training is an iterative cycle. The model makes a prediction, compares it to the correct answer, measures how far off it was (the error), and adjusts its internal parameters to reduce that error. Then it repeats — millions or billions of times. Each pass through the data nudges the model closer to accurate predictions. This is called optimization.
What the Model Actually Learns
The model doesn’t memorize the training data. It learns statistical relationships — which features correlate with which outcomes, how strongly, and in what combinations. A fraud model might learn that transactions over $5,000 from a new device in a foreign country at 3 AM have a 94% probability of being fraudulent. That pattern is encoded as numerical weights inside the model.
Training Is Expensive
Training requires significant compute resources. For a simple business model, training might take hours on a single machine. For a frontier language model like GPT-4, training costs are estimated at $50–100 million and require thousands of specialized GPUs running for months. This is a one-time (or periodic) cost — once trained, the model can be used repeatedly.
Key insight: Training is the capital expenditure of AI. It’s the upfront investment that creates the asset. The quality of this investment — the data, the compute, the expertise — determines the value of everything that follows.
psychology
The Model: Compressed Experience
What training produces
What a Model Is
A model is a mathematical function that takes inputs and produces outputs. It’s the distilled result of training — all the patterns, relationships, and statistical regularities found in the data, compressed into a set of numerical parameters (called weights). GPT-4 has an estimated 1.8 trillion parameters. A simple business model might have a few thousand.
Models Are Not Databases
A common misconception: the model does not store the training data. It stores patterns extracted from the data. A model trained on 10 million customer records doesn’t contain those records — it contains the statistical relationships between customer attributes and outcomes. This is why models can generalize to new situations they haven’t seen before.
Model Selection
Different problems require different model architectures. Linear models work for simple relationships. Decision trees handle categorical decisions. Neural networks excel at complex, unstructured data like images and text. Choosing the right architecture is a critical technical decision — and using an overly complex model for a simple problem wastes resources and can actually reduce accuracy.
Why it matters: When a vendor says “we have a proprietary AI model,” they’re describing this artifact — a trained set of parameters. The value lies in the quality of training data, the architecture choices, and the optimization process. The model itself is just a file.
play_circle
Inference: Putting the Model to Work
Where value is created
What Inference Is
Inference is the production use of a trained model. New data comes in, the model applies its learned patterns, and a prediction comes out. When Gmail flags an email as spam, when Netflix recommends a show, when a bank approves a loan in real time — that’s inference. It happens continuously, often in milliseconds, serving millions of users simultaneously.
The Cost Shift
Inference now dominates enterprise AI spending. Menlo Ventures found that in 2025, enterprises spent $18 billion on inference (foundation model APIs) versus $4 billion on training. As of 2026, 44% of organizations allocate 76–100% of their AI budget to inference rather than training. Per-token inference costs have dropped 1,000x since late 2022 — but total spending surged 320% because cheaper inference unlocked vastly more use cases.
Training vs. Inference Economics
Training is a one-time capital investment. You do it once (or periodically to update the model). It’s expensive but bounded.

Inference is an ongoing operational cost. It runs 24/7 in production. It scales with usage. By 2026, inference is projected to account for two-thirds of all AI compute, up from one-third in 2023.
Key insight: Most executive attention goes to training costs (“how much to build the model?”). But inference costs are what show up on the monthly bill. Understanding this distinction is critical for AI budgeting and vendor negotiations.
assessment
Evaluation: How Do You Know It Works?
Measuring model performance before deployment
The Holdout Test
Before deploying a model, you test it on data it has never seen. During training, a portion of the data (typically 20–30%) is set aside as a “test set.” The model is evaluated on this held-out data to measure how well it generalizes. If it performs well on training data but poorly on test data, it has memorized rather than learned — a problem called overfitting.
Key Metrics
Accuracy — What percentage of predictions are correct?
Precision — Of the items flagged as positive, how many actually are? (Matters when false alarms are costly.)
Recall — Of all actual positives, how many did the model catch? (Matters when missing a case is dangerous.)
F1 Score — The balance between precision and recall.
The Business Metric Gap
Technical metrics (accuracy, F1) don’t always translate to business value. A fraud model with 99% accuracy sounds impressive — but if only 0.1% of transactions are fraudulent, a model that simply says “not fraud” every time achieves 99.9% accuracy while catching nothing. The right metric depends on the business context — what’s the cost of a false positive vs. a false negative?
Why it matters: Always ask: “What metric are we optimizing for, and does it align with the business outcome we care about?” A model can score perfectly on technical benchmarks and still fail to deliver business value.
warning
Failure Modes: What Goes Wrong
The three ways ML models fail
Overfitting
The model is too complex for the data. It memorizes the training examples, including their noise and quirks, rather than learning the underlying pattern. It performs brilliantly on training data and poorly on anything new. This is the most common failure mode in practice — the model looks great in the lab and fails in production.
Underfitting
The model is too simple for the problem. It can’t capture the real patterns in the data. It performs poorly on both training and test data. This usually means the wrong model architecture was chosen, or the features provided don’t contain enough information to make accurate predictions.
Data Drift
The model was trained on data that no longer reflects reality. Customer behavior changes. Market conditions shift. A fraud model trained on pre-pandemic transaction patterns may fail on post-pandemic data because spending habits changed fundamentally. Models don’t automatically adapt — they need to be retrained on current data.
Critical for leaders: AI models degrade over time. They are not “set and forget” assets. Budget for ongoing monitoring, evaluation, and retraining. The organizations that treat models as living systems — not one-time deployments — are the ones that sustain value from AI.
category
The Four Types of Machine Learning
A framework for the chapters ahead
Supervised Learning
Learn from labeled examples. You provide inputs paired with correct outputs. The model learns the mapping. Used for classification (spam/not spam) and regression (price prediction). The most widely deployed type in enterprise AI. Covered in Chapter 5.
Unsupervised Learning
Find structure in unlabeled data. No correct answers are provided. The model discovers patterns on its own — grouping similar customers, detecting anomalies, reducing data complexity. Used for segmentation, anomaly detection, and exploratory analysis. Covered in Chapter 6.
Reinforcement Learning
Learn by trial and error. An agent takes actions in an environment and receives rewards or penalties. It learns strategies that maximize long-term reward. This is how AlphaGo learned to play Go and how ChatGPT was refined through RLHF (reinforcement learning from human feedback). Powerful but complex to implement.
Self-Supervised Learning
Learn from the data itself. The model creates its own training signal — for example, predicting the next word in a sentence. This is how large language models like GPT are trained: mask a word, predict it, check, repeat — across trillions of words. It eliminates the need for manual labeling, enabling training at unprecedented scale.
Rule of thumb: Supervised learning is the safest bet for most business problems. Unsupervised learning is valuable for exploration. Reinforcement learning is powerful but niche. Self-supervised learning is what powers the generative AI revolution.