Ch 8: RNNs & Sequences — Under the Hood

Ch 8 — RNNs & Sequences

Recurrence math, LSTM gates, GRU, BPTT, and the road to attention

Under the Hood

Click play or press Space to begin the deep dive...

Step- / 10

Zone AVanilla RNN — Forward Pass & BPTTSteps 1–2

functions

RNN Forward

h\u209c = tanh(W\u2095h + W\u2093x + b)

arrow_back

BPTT

Gradients through time

arrow_downward Vanishing gradient → need gating

Zone BLSTM — Gates & Cell StateSteps 3–5

lock_open

Forget Gate

f = σ(W\u1da0[h,x]+b)

add_circle

Input & Cell

C\u209c = f\u2299C + i\u2299g

output

Output & Gradient

Gradient highway

arrow_downward Simplify 3 gates → 2 gates

Zone CGRU — Simplified GatingSteps 6–7

tune

GRU Equations

Reset + Update gates

calculate

Parameter Count

LSTM vs GRU vs RNN

arrow_downward Sequence → sequence translation

Zone DSeq2Seq & Bahdanau AttentionSteps 8–9

compress

Encoder-Decoder

Context vector bottleneck

center_focus_strong

Attention Scores

Alignment weights

arrow_downward Practical implementation

Zone EPractical Considerations & Modern AlternativesStep 10

speed

Bidirectional & SSMs

Stacking, Mamba, complexity