Ch 2 — Matrices & Transformations

Instagram filters for data — how neural networks see through new lenses
Linear Algebra
filter
Filter
arrow_forward
grid_on
Multiply
arrow_forward
swap_horiz
Transpose
arrow_forward
undo
Inverse
arrow_forward
calculate
Determinant
arrow_forward
neurology
y = Wx + b
arrow_forward
image
Vision
-
Click play or press Space to begin...
Step- / 8
filter
A Matrix Is an Instagram Filter
Transforming every pixel at once
The Analogy
When you apply an Instagram filter, every single pixel gets transformed — colors shift, brightness changes, contrast adjusts. A matrix does the same thing to data: it takes an input vector and transforms it into a new vector. Rotate, stretch, shrink, skew — all of these are matrix operations. A neural network layer is just a very sophisticated “filter” applied to your data.
Key insight: Every neural network layer does y = Wx + b. The weight matrix W is literally a “filter” that transforms input data into a new representation — looking at the data through a new lens.
What Is a Matrix?
A matrix is a grid of numbers arranged in rows and columns. A 2×3 matrix has 2 rows and 3 columns:
# A 2×3 matrix M = [[1, 2, 3], [4, 5, 6]] # In NumPy import numpy as np M = np.array([[1,2,3], [4,5,6]]) M.shape # (2, 3) → 2 rows, 3 columns
Real World
Instagram filter: transforms every pixel’s RGB values
In AI
Weight matrix: transforms every input feature into a new representation
grid_on
Matrix Multiplication — The Core Operation
Rows meet columns: the most important operation in AI
The Analogy
Imagine a recipe card. The matrix is the recipe (how much of each ingredient per dish). The input vector is your pantry (how much of each ingredient you have). Matrix multiplication tells you how many of each dish you can make. Each row of the matrix “asks a question” about the input by taking a dot product.
Key insight: Matrix multiplication is just many dot products at once. Each row of the matrix computes one dot product with the input. A neural network layer with 512 neurons = 512 dot products = one matrix multiply.
Worked Example
# Matrix × Vector # W is 2×3, x is 3×1 → result is 2×1 W = np.array([[1, 2, 3], [4, 5, 6]]) x = np.array([7, 8, 9]) y = W @ x # Row 1: 1×7 + 2×8 + 3×9 = 7+16+27 = 50 # Row 2: 4×7 + 5×8 + 6×9 = 28+40+54 = 122 # y = [50, 122] # Rule: (m×n) @ (n×p) → (m×p) # Inner dimensions must match!
Shape rule: (m×n) @ (n×p) = (m×p). The inner dimensions (n) must match. The result has the outer dimensions (m×p).
swap_horiz
Transpose — Flipping Rows and Columns
Mirror the matrix along its diagonal
The Analogy
Imagine a spreadsheet where rows are students and columns are subjects. The transpose flips it: now rows are subjects and columns are students. Same data, different perspective. It’s like rotating a table 90° — what was a row becomes a column.
Key insight: In PyTorch, W.T (transpose) appears everywhere. The linear layer computes x @ W.T + b because the weight matrix stores neurons as rows, but we need columns for the dot products to work. Transpose is the “adapter plug.”
Worked Example
A = np.array([[1, 2, 3], [4, 5, 6]]) # A is 2×3 A_T = A.T # A.T is 3×2: # [[1, 4], # [2, 5], # [3, 6]] # Key property: (AB)ᵀ = BᵀAᵀ # Transpose reverses multiplication order! # PyTorch linear layer under the hood: # output = input @ weight.T + bias
Real World
Flip a spreadsheet: students×subjects becomes subjects×students
In AI
Transpose adapts weight matrix shape for matrix multiplication
undo
Inverse — Undoing a Transformation
The “Ctrl+Z” of linear algebra
The Analogy
If a matrix is a filter that transforms your photo, the inverse matrix is the “undo” button that restores the original. Apply the filter, then apply the inverse — you get back to where you started. Mathematically: A × A⁻¹ = I (the identity matrix, which changes nothing).
Key insight: Not every transformation can be undone. If a filter crushes 3D data into 2D (like a shadow), you can’t recover the depth. Matrices without inverses are called singular — they destroy information. This is why neural networks need careful architecture: you don’t want layers that throw away useful signal.
Worked Example
A = np.array([[2, 1], [5, 3]]) A_inv = np.linalg.inv(A) # [[ 3, -1], # [-5, 2]] # Verify: A × A⁻¹ = Identity A @ A_inv # [[1, 0], # [0, 1]] ← identity matrix! # Singular matrix (no inverse): B = np.array([[1, 2], [2, 4]]) # Row 2 = 2 × Row 1 → dependent → no inverse
Identity matrix I: Diagonal of 1s, zeros elsewhere. A × I = A. It’s the “do nothing” transformation — like a filter that leaves the photo unchanged.
calculate
Determinant & Rank
How much does the transformation stretch or squash space?
The Analogy
The determinant tells you how much a matrix scales area (or volume). Imagine a 1×1 square. After the transformation, it becomes a parallelogram. The determinant is the area of that parallelogram. If det = 2, the matrix doubles all areas. If det = 0, the matrix collapses space (like squashing a box flat). The rank tells you how many dimensions survive the transformation.
Key insight: A matrix with rank less than its size is “losing dimensions” — it’s compressing data. This is exactly what happens in autoencoders and dimensionality reduction: a low-rank transformation keeps only the most important dimensions.
Worked Example
# Determinant: scaling factor of area A = np.array([[2, 0], [0, 3]]) np.linalg.det(A) # 6.0 → areas scale by 6× # Singular matrix: det = 0 (collapses space) B = np.array([[1, 2], [2, 4]]) np.linalg.det(B) # 0.0 → squashed flat! # Rank: how many independent dimensions np.linalg.matrix_rank(A) # 2 (full rank) np.linalg.matrix_rank(B) # 1 (lost a dimension)
Real World
det = 0: a shadow (3D→2D) loses depth information forever
In AI
Low-rank layers compress data; autoencoders exploit this deliberately
transform
Linear Transformations — Rotate, Scale, Shear
Every matrix is a geometric operation
The Analogy
Every matrix corresponds to a geometric transformation. A rotation matrix spins things. A scaling matrix stretches or shrinks. A shear matrix tilts (like italicizing text). The beauty: combining transformations = multiplying matrices. Rotate then scale? Multiply the rotation matrix by the scaling matrix.
Key insight: A deep neural network with 10 layers is 10 matrix multiplications in a row. Each layer rotates, scales, and reshapes the data in a new way. By the final layer, the data has been transformed so many times that originally tangled classes become linearly separable.
Worked Example
# 90° rotation matrix theta = np.pi / 2 R = np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]) # R ≈ [[0, -1], [1, 0]] # Apply to point (1, 0) → (0, 1) R @ np.array([1, 0]) # [0, 1] rotated 90°! # Scaling matrix: stretch x by 2, y by 3 S = np.array([[2, 0], [0, 3]]) # Combine: rotate THEN scale = S @ R combined = S @ R # one matrix does both!
Key insight: Matrix multiplication is NOT commutative: A×B ≠ B×A. Rotate-then-scale gives a different result than scale-then-rotate. Order matters in neural networks too — layer order changes the learned representation.
neurology
y = Wx + b — The Neural Network Layer
The most important equation in AI
The Analogy
A neural network layer is a committee of experts. Each neuron (row of W) is one expert who looks at the input from a different angle. The bias b is each expert’s personal baseline opinion. The output y is the committee’s collective assessment. With 512 neurons, you have 512 experts each computing a dot product with the input.
Key insight: Without activation functions, stacking layers is pointless — multiplying matrices just gives another matrix. That’s why ReLU, sigmoid, etc. exist: they add the non-linearity that lets networks learn curves, not just straight lines.
Worked Example
import torch import torch.nn as nn # Linear layer: 3 inputs → 2 outputs layer = nn.Linear(3, 2) # layer.weight.shape = (2, 3) ← W # layer.bias.shape = (2,) ← b x = torch.tensor([1.0, 2.0, 3.0]) y = layer(x) # = x @ W.T + b # Under the hood: # y[0] = w[0,0]×1 + w[0,1]×2 + w[0,2]×3 + b[0] # y[1] = w[1,0]×1 + w[1,1]×2 + w[1,2]×3 + b[1]
Real World
512 experts each score the input from their angle, plus personal bias
In AI
512 neurons: W is (512×input_dim), b is (512,), output is (512,)
image
Matrices in Computer Vision
How images get transformed, layer by layer
The Analogy
A digital image is already a matrix — rows of pixels, each pixel a vector of RGB values. Image augmentation (rotate, flip, crop) uses matrix transformations. Convolutional layers apply small filter matrices that slide across the image, detecting edges, textures, and shapes. Each layer sees the image through a different “lens.”
Why it matters for AI: GPUs are essentially matrix multiplication machines. NVIDIA’s Tensor Cores can multiply 4×4 matrices in a single clock cycle. The reason AI exploded in the 2010s isn’t just better algorithms — it’s that GPUs made massive matrix multiplications fast enough to be practical.
In Practice
import torch import torchvision.transforms as T # Image as a matrix: (3, 224, 224) # 3 channels (RGB) × 224 rows × 224 cols img = torch.randn(3, 224, 224) # A conv layer: small matrix filters conv = torch.nn.Conv2d(3, 64, kernel_size=3) # 64 filters, each 3×3×3 = 27 weights # Slides across image → 64 feature maps features = conv(img.unsqueeze(0)) # Output: (1, 64, 222, 222) # 64 "views" of the image
Real World
Instagram: one filter transforms all pixels at once
In AI
Conv layer: 64 tiny filter matrices slide across the image, each detecting different patterns