Ch 8 — RNNs & Sequences

Recurrence math, LSTM gates, GRU, BPTT, and the road to attention
Under the Hood
-
Click play or press Space to begin the deep dive...
Step- / 10
Zone AVanilla RNN — Forward Pass & BPTTSteps 1–2
1
functions
RNN Forward
h\u209c = tanh(W\u2095h + W\u2093x + b)
2
arrow_back
BPTT
Gradients through time
arrow_downward Vanishing gradient → need gating
3
Zone BLSTM — Gates & Cell StateSteps 3–5
3
lock_open
Forget Gate
f = σ(W\u1da0[h,x]+b)
4
add_circle
Input & Cell
C\u209c = f\u2299C + i\u2299g
5
output
Output & Gradient
Gradient highway
arrow_downward Simplify 3 gates → 2 gates
6
Zone CGRU — Simplified GatingSteps 6–7
6
tune
GRU Equations
Reset + Update gates
7
calculate
Parameter Count
LSTM vs GRU vs RNN
arrow_downward Sequence → sequence translation
8
Zone DSeq2Seq & Bahdanau AttentionSteps 8–9
8
compress
Encoder-Decoder
Context vector bottleneck
9
center_focus_strong
Attention Scores
Alignment weights
arrow_downward Practical implementation
10
Zone EPractical Considerations & Modern AlternativesStep 10
10
speed
Bidirectional & SSMs
Stacking, Mamba, complexity
1
Title