Ch 3: How Machines Learn

Ch 3 — How Machines Learn

The core mental model for training, models, and predictions — no math required

Index

High Level

database

Data

arrow_forward

cleaning_services

Prepare

arrow_forward

model_training

Train

arrow_forward

assessment

Evaluate

arrow_forward

rocket_launch

Deploy

arrow_forward

monitoring

Monitor

Click play or press Space to begin...

Step- / 8

swap_horiz

The Fundamental Shift

From programming rules to learning from data

Traditional Software

In traditional software, a developer writes explicit rules: “If the transaction amount exceeds $10,000 and the location is different from the last three transactions, flag it.” The programmer anticipates every scenario and codes the logic. The system does exactly what it’s told — nothing more, nothing less.

Machine Learning

In machine learning, you provide data and outcomes, and the system figures out the rules itself. You show it millions of transactions labeled “fraud” or “legitimate,” and it identifies patterns that distinguish the two — patterns that may be too subtle or too complex for a human to articulate. The rules emerge from the data, not from a programmer’s assumptions.

Traditional Approach

Input: Rules + Data
Output: Answers

The programmer defines the logic. The system executes it. Rigid, predictable, limited to what was anticipated.

ML Approach

Input: Data + Answers
Output: Rules (the model)

The system discovers the logic. The programmer defines the goal. Flexible, adaptive, can find patterns humans miss.

Key insight: This inversion — from “human writes rules” to “machine discovers rules” — is the conceptual core of machine learning. Everything else in this chapter builds on this idea.

database

Data: The Raw Material

What goes in determines what comes out

What Counts as Data

ML systems learn from examples. These can be structured (spreadsheets, databases, transaction logs) or unstructured (emails, images, audio, documents). A fraud detection model learns from historical transactions. A language model learns from text scraped from the internet. A medical imaging system learns from thousands of labeled X-rays.

Labels and Features

Features are the characteristics the model uses to make predictions — transaction amount, time of day, merchant category. Labels are the correct answers — “fraud” or “legitimate.” In supervised learning (the most common type), you need both. The quality and relevance of your features often matters more than the sophistication of the algorithm.

The Data Quality Problem

Most enterprise AI projects spend 60–80% of their time on data preparation — cleaning, formatting, deduplicating, and labeling. Raw data is messy: missing values, inconsistent formats, duplicate records, outdated entries. A model trained on poor data will produce poor results, regardless of how advanced the algorithm is.

Why it matters: When an AI project fails, the cause is rarely the algorithm. It’s almost always the data — not enough of it, wrong kind of it, or poor quality. This is the single most important factor executives should scrutinize in any AI initiative. Chapter 4 goes deeper.

model_training

Training: How the Machine Learns

The iterative process of getting better at predictions

The Process

Training is an iterative cycle. The model makes a prediction, compares it to the correct answer, measures how far off it was (the error), and adjusts its internal parameters to reduce that error. Then it repeats — millions or billions of times. Each pass through the data nudges the model closer to accurate predictions. This is called optimization.

What the Model Actually Learns

The model doesn’t memorize the training data. It learns statistical relationships — which features correlate with which outcomes, how strongly, and in what combinations. A fraud model might learn that transactions over $5,000 from a new device in a foreign country at 3 AM have a 94% probability of being fraudulent. That pattern is encoded as numerical weights inside the model.

Training Is Expensive

Training requires significant compute resources. For a simple business model, training might take hours on a single machine. For a frontier language model like GPT-4, training costs are estimated at $50–100 million and require thousands of specialized GPUs running for months. This is a one-time (or periodic) cost — once trained, the model can be used repeatedly.

Key insight: Training is the capital expenditure of AI. It’s the upfront investment that creates the asset. The quality of this investment — the data, the compute, the expertise — determines the value of everything that follows.

psychology

The Model: Compressed Experience

What training produces

What a Model Is

A model is a mathematical function that takes inputs and produces outputs. It’s the distilled result of training — all the patterns, relationships, and statistical regularities found in the data, compressed into a set of numerical parameters (called weights). GPT-4 has an estimated 1.8 trillion parameters. A simple business model might have a few thousand.

Models Are Not Databases

A common misconception: the model does not store the training data. It stores patterns extracted from the data. A model trained on 10 million customer records doesn’t contain those records — it contains the statistical relationships between customer attributes and outcomes. This is why models can generalize to new situations they haven’t seen before.

Model Selection

Different problems require different model architectures. Linear models work for simple relationships. Decision trees handle categorical decisions. Neural networks excel at complex, unstructured data like images and text. Choosing the right architecture is a critical technical decision — and using an overly complex model for a simple problem wastes resources and can actually reduce accuracy.

Why it matters: When a vendor says “we have a proprietary AI model,” they’re describing this artifact — a trained set of parameters. The value lies in the quality of training data, the architecture choices, and the optimization process. The model itself is just a file.

play_circle

Inference: Putting the Model to Work

Where value is created

What Inference Is

Inference is the production use of a trained model. New data comes in, the model applies its learned patterns, and a prediction comes out. When Gmail flags an email as spam, when Netflix recommends a show, when a bank approves a loan in real time — that’s inference. It happens continuously, often in milliseconds, serving millions of users simultaneously.

The Cost Shift

Inference now dominates enterprise AI spending. Menlo Ventures found that in 2025, enterprises spent $18 billion on inference (foundation model APIs) versus $4 billion on training. As of 2026, 44% of organizations allocate 76–100% of their AI budget to inference rather than training. Per-token inference costs have dropped 1,000x since late 2022 — but total spending surged 320% because cheaper inference unlocked vastly more use cases.

Training vs. Inference Economics

Training is a one-time capital investment. You do it once (or periodically to update the model). It’s expensive but bounded.

Inference is an ongoing operational cost. It runs 24/7 in production. It scales with usage. By 2026, inference is projected to account for two-thirds of all AI compute, up from one-third in 2023.

Key insight: Most executive attention goes to training costs (“how much to build the model?”). But inference costs are what show up on the monthly bill. Understanding this distinction is critical for AI budgeting and vendor negotiations.

assessment

Evaluation: How Do You Know It Works?

Measuring model performance before deployment

The Holdout Test

Before deploying a model, you test it on data it has never seen. During training, a portion of the data (typically 20–30%) is set aside as a “test set.” The model is evaluated on this held-out data to measure how well it generalizes. If it performs well on training data but poorly on test data, it has memorized rather than learned — a problem called overfitting.

Key Metrics

Accuracy — What percentage of predictions are correct?
Precision — Of the items flagged as positive, how many actually are? (Matters when false alarms are costly.)
Recall — Of all actual positives, how many did the model catch? (Matters when missing a case is dangerous.)
F1 Score — The balance between precision and recall.

The Business Metric Gap

Technical metrics (accuracy, F1) don’t always translate to business value. A fraud model with 99% accuracy sounds impressive — but if only 0.1% of transactions are fraudulent, a model that simply says “not fraud” every time achieves 99.9% accuracy while catching nothing. The right metric depends on the business context — what’s the cost of a false positive vs. a false negative?

Why it matters: Always ask: “What metric are we optimizing for, and does it align with the business outcome we care about?” A model can score perfectly on technical benchmarks and still fail to deliver business value.

warning

Failure Modes: What Goes Wrong

The three ways ML models fail

Overfitting

The model is too complex for the data. It memorizes the training examples, including their noise and quirks, rather than learning the underlying pattern. It performs brilliantly on training data and poorly on anything new. This is the most common failure mode in practice — the model looks great in the lab and fails in production.

Underfitting

The model is too simple for the problem. It can’t capture the real patterns in the data. It performs poorly on both training and test data. This usually means the wrong model architecture was chosen, or the features provided don’t contain enough information to make accurate predictions.

Data Drift

The model was trained on data that no longer reflects reality. Customer behavior changes. Market conditions shift. A fraud model trained on pre-pandemic transaction patterns may fail on post-pandemic data because spending habits changed fundamentally. Models don’t automatically adapt — they need to be retrained on current data.

Critical for leaders: AI models degrade over time. They are not “set and forget” assets. Budget for ongoing monitoring, evaluation, and retraining. The organizations that treat models as living systems — not one-time deployments — are the ones that sustain value from AI.