Why Bidirectional?
A standard RNN only sees past context. But in many tasks, future context matters too. In “He went to the bank to deposit money,” the word “deposit” (which comes after “bank”) disambiguates that “bank” means a financial institution. A Bidirectional RNN (BiRNN) runs two separate RNNs: one forward (left to right) and one backward (right to left). Their hidden states are concatenated at each position, giving each token access to the full sentence context.
Key insight: Bidirectional LSTMs (BiLSTMs) were the dominant architecture for NLP from ~2015 to 2018, used in named entity recognition, POS tagging, and as the backbone of ELMo (Peters et al., 2018). BERT later achieved the same bidirectional context using masked attention in transformers.
BiLSTM in PyTorch
import torch.nn as nn
bilstm = nn.LSTM(
input_size=128,
hidden_size=256,
num_layers=2,
bidirectional=True, // ← key flag
batch_first=True
)
// Output hidden size = 256 × 2 = 512
// (forward 256 + backward 256)
// Can only be used when full sequence
// is available (not for autoregressive gen)