The Analogy
PyTorch’s autograd is like having a robotic accountant who watches you cook, records every step, and automatically calculates exactly how much each ingredient contributed to the final taste. You just cook (write the forward pass). The accountant handles all the blame-tracing (backward pass) automatically.
Why it matters for AI: Before autograd, researchers had to manually derive and implement gradients for every new architecture. PyTorch’s dynamic computational graph means you can use Python control flow (if/else, loops) and autograd still works. This is why PyTorch became the dominant research framework.
In Practice
import torch
# Same 2-layer example, but PyTorch does it
x = torch.tensor(2.0)
y = torch.tensor(1.0)
w1 = torch.tensor(0.5, requires_grad=True)
w2 = torch.tensor(-0.3, requires_grad=True)
# Forward (PyTorch records the graph)
z1 = w1 * x
a1 = torch.relu(z1)
z2 = w2 * a1
L = (z2 - y) ** 2
# Backward (one call does everything!)
L.backward()
w1.grad # tensor(1.56) — same as manual!
w2.grad # tensor(-2.60) — same as manual!
Source: Paszke et al. (2017) “Automatic differentiation in PyTorch” introduced the dynamic computational graph approach. PyTorch builds a new graph each forward pass, enabling Python-native control flow.