Ch 2: Matrices & Transformations — Mathematics Behind AI & ML

filter

A Matrix Is an Instagram Filter

Transforming every pixel at once

The Analogy

When you apply an Instagram filter, every single pixel gets transformed — colors shift, brightness changes, contrast adjusts. A matrix does the same thing to data: it takes an input vector and transforms it into a new vector. Rotate, stretch, shrink, skew — all of these are matrix operations. A neural network layer is just a very sophisticated “filter” applied to your data.

Key insight: Every neural network layer does y = Wx + b. The weight matrix W is literally a “filter” that transforms input data into a new representation — looking at the data through a new lens.

What Is a Matrix?

A matrix is a grid of numbers arranged in rows and columns. A 2×3 matrix has 2 rows and 3 columns:

# A 2×3 matrix M = [[1, 2, 3], [4, 5, 6]] # In NumPy import numpy as np M = np.array([[1,2,3], [4,5,6]]) M.shape # (2, 3) → 2 rows, 3 columns

Real World

Instagram filter: transforms every pixel’s RGB values

In AI

Weight matrix: transforms every input feature into a new representation

grid_on

Matrix Multiplication — The Core Operation

Rows meet columns: the most important operation in AI

The Analogy

Imagine a recipe card. The matrix is the recipe (how much of each ingredient per dish). The input vector is your pantry (how much of each ingredient you have). Matrix multiplication tells you how many of each dish you can make. Each row of the matrix “asks a question” about the input by taking a dot product.

Key insight: Matrix multiplication is just many dot products at once. Each row of the matrix computes one dot product with the input. A neural network layer with 512 neurons = 512 dot products = one matrix multiply.

Worked Example

# Matrix × Vector # W is 2×3, x is 3×1 → result is 2×1 W = np.array([[1, 2, 3], [4, 5, 6]]) x = np.array([7, 8, 9]) y = W @ x # Row 1: 1×7 + 2×8 + 3×9 = 7+16+27 = 50 # Row 2: 4×7 + 5×8 + 6×9 = 28+40+54 = 122 # y = [50, 122] # Rule: (m×n) @ (n×p) → (m×p) # Inner dimensions must match!

Shape rule: (m×n) @ (n×p) = (m×p). The inner dimensions (n) must match. The result has the outer dimensions (m×p).

swap_horiz

Transpose — Flipping Rows and Columns

Mirror the matrix along its diagonal

The Analogy

Imagine a spreadsheet where rows are students and columns are subjects. The transpose flips it: now rows are subjects and columns are students. Same data, different perspective. It’s like rotating a table 90° — what was a row becomes a column.

Key insight: In PyTorch, W.T (transpose) appears everywhere. The linear layer computes x @ W.T + b because the weight matrix stores neurons as rows, but we need columns for the dot products to work. Transpose is the “adapter plug.”

Worked Example

A = np.array([[1, 2, 3], [4, 5, 6]]) # A is 2×3 A_T = A.T # A.T is 3×2: # [[1, 4], # [2, 5], # [3, 6]] # Key property: (AB)ᵀ = BᵀAᵀ # Transpose reverses multiplication order! # PyTorch linear layer under the hood: # output = input @ weight.T + bias

Real World

Flip a spreadsheet: students×subjects becomes subjects×students

In AI

Transpose adapts weight matrix shape for matrix multiplication

undo

Inverse — Undoing a Transformation

The “Ctrl+Z” of linear algebra

The Analogy

If a matrix is a filter that transforms your photo, the inverse matrix is the “undo” button that restores the original. Apply the filter, then apply the inverse — you get back to where you started. Mathematically: A × A⁻¹ = I (the identity matrix, which changes nothing).

Key insight: Not every transformation can be undone. If a filter crushes 3D data into 2D (like a shadow), you can’t recover the depth. Matrices without inverses are called singular — they destroy information. This is why neural networks need careful architecture: you don’t want layers that throw away useful signal.

Worked Example

A = np.array([[2, 1], [5, 3]]) A_inv = np.linalg.inv(A) # [[ 3, -1], # [-5, 2]] # Verify: A × A⁻¹ = Identity A @ A_inv # [[1, 0], # [0, 1]] ← identity matrix! # Singular matrix (no inverse): B = np.array([[1, 2], [2, 4]]) # Row 2 = 2 × Row 1 → dependent → no inverse

Identity matrix I: Diagonal of 1s, zeros elsewhere. A × I = A. It’s the “do nothing” transformation — like a filter that leaves the photo unchanged.

calculate

Determinant & Rank

How much does the transformation stretch or squash space?

The Analogy

The determinant tells you how much a matrix scales area (or volume). Imagine a 1×1 square. After the transformation, it becomes a parallelogram. The determinant is the area of that parallelogram. If det = 2, the matrix doubles all areas. If det = 0, the matrix collapses space (like squashing a box flat). The rank tells you how many dimensions survive the transformation.

Key insight: A matrix with rank less than its size is “losing dimensions” — it’s compressing data. This is exactly what happens in autoencoders and dimensionality reduction: a low-rank transformation keeps only the most important dimensions.

Worked Example

# Determinant: scaling factor of area A = np.array([[2, 0], [0, 3]]) np.linalg.det(A) # 6.0 → areas scale by 6× # Singular matrix: det = 0 (collapses space) B = np.array([[1, 2], [2, 4]]) np.linalg.det(B) # 0.0 → squashed flat! # Rank: how many independent dimensions np.linalg.matrix_rank(A) # 2 (full rank) np.linalg.matrix_rank(B) # 1 (lost a dimension)

Real World

det = 0: a shadow (3D→2D) loses depth information forever

In AI

Low-rank layers compress data; autoencoders exploit this deliberately

transform

Linear Transformations — Rotate, Scale, Shear

Every matrix is a geometric operation

The Analogy

Every matrix corresponds to a geometric transformation. A rotation matrix spins things. A scaling matrix stretches or shrinks. A shear matrix tilts (like italicizing text). The beauty: combining transformations = multiplying matrices. Rotate then scale? Multiply the rotation matrix by the scaling matrix.

Key insight: A deep neural network with 10 layers is 10 matrix multiplications in a row. Each layer rotates, scales, and reshapes the data in a new way. By the final layer, the data has been transformed so many times that originally tangled classes become linearly separable.

Worked Example

# 90° rotation matrix theta = np.pi / 2 R = np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]) # R ≈ [[0, -1], [1, 0]] # Apply to point (1, 0) → (0, 1) R @ np.array([1, 0]) # [0, 1] rotated 90°! # Scaling matrix: stretch x by 2, y by 3 S = np.array([[2, 0], [0, 3]]) # Combine: rotate THEN scale = S @ R combined = S @ R # one matrix does both!

Key insight: Matrix multiplication is NOT commutative: A×B ≠ B×A. Rotate-then-scale gives a different result than scale-then-rotate. Order matters in neural networks too — layer order changes the learned representation.

neurology

y = Wx + b — The Neural Network Layer

The most important equation in AI

The Analogy

A neural network layer is a committee of experts. Each neuron (row of W) is one expert who looks at the input from a different angle. The bias b is each expert’s personal baseline opinion. The output y is the committee’s collective assessment. With 512 neurons, you have 512 experts each computing a dot product with the input.

Key insight: Without activation functions, stacking layers is pointless — multiplying matrices just gives another matrix. That’s why ReLU, sigmoid, etc. exist: they add the non-linearity that lets networks learn curves, not just straight lines.

Worked Example

import torch import torch.nn as nn # Linear layer: 3 inputs → 2 outputs layer = nn.Linear(3, 2) # layer.weight.shape = (2, 3) ← W # layer.bias.shape = (2,) ← b x = torch.tensor([1.0, 2.0, 3.0]) y = layer(x) # = x @ W.T + b # Under the hood: # y[0] = w[0,0]×1 + w[0,1]×2 + w[0,2]×3 + b[0] # y[1] = w[1,0]×1 + w[1,1]×2 + w[1,2]×3 + b[1]

Real World

512 experts each score the input from their angle, plus personal bias

In AI

512 neurons: W is (512×input_dim), b is (512,), output is (512,)

image

Matrices in Computer Vision

How images get transformed, layer by layer

The Analogy

A digital image is already a matrix — rows of pixels, each pixel a vector of RGB values. Image augmentation (rotate, flip, crop) uses matrix transformations. Convolutional layers apply small filter matrices that slide across the image, detecting edges, textures, and shapes. Each layer sees the image through a different “lens.”

Why it matters for AI: GPUs are essentially matrix multiplication machines. NVIDIA’s Tensor Cores can multiply 4×4 matrices in a single clock cycle. The reason AI exploded in the 2010s isn’t just better algorithms — it’s that GPUs made massive matrix multiplications fast enough to be practical.

In Practice

import torch import torchvision.transforms as T # Image as a matrix: (3, 224, 224) # 3 channels (RGB) × 224 rows × 224 cols img = torch.randn(3, 224, 224) # A conv layer: small matrix filters conv = torch.nn.Conv2d(3, 64, kernel_size=3) # 64 filters, each 3×3×3 = 27 weights # Slides across image → 64 feature maps features = conv(img.unsqueeze(0)) # Output: (1, 64, 222, 222) # 64 "views" of the image

Real World

Instagram: one filter transforms all pixels at once

In AI

Conv layer: 64 tiny filter matrices slide across the image, each detecting different patterns

Ch 2 — Matrices & Transformations