LSTM Cell: Input, Forget, and Output Gates
Learners will diagram the LSTM cell, trace information flow through each gate, and implement an LSTM text classifier using nn.LSTM.
Why RNNs Need Gating Mechanisms
Standard Vanilla RNNs struggle with long-range dependencies because gradients vanish during backpropagation through time. The hidden state h_t = tanh(W_h * h_{t-1} + W_x * x_t) overwrites previous context with each new input, causing the network to forget information from many steps back.
The Long Short-Term Memory (LSTM) was designed in 1997 by Hochreiter and Schmidhuber to solve this problem. It introduces a cell state — a separate memory conveyor belt — alongside gating mechanisms that explicitly control what information is added, removed, or passed forward.
The LSTM Cell State Concept
The LSTM has two internal states: the cell state C_t and the hidden state h_t. The cell state runs like a conveyor belt through the entire sequence, with gates controlling minor linear interactions. This makes gradients flow more easily through time.
The hidden state h_t is the output at each timestep, computed from the cell state. Think of C_t as long-term memory and h_t as working memory that gets passed to the next layer and returned as output.
import torch
import torch.nn as nn
# An LSTM processes sequences and returns (output, (h_n, c_n))
lstm = nn.LSTM(input_size=10, hidden_size=32, batch_first=True)
print('LSTM parameters:', sum(p.numel() for p in lstm.parameters()))All lessons in this course
- Vanilla RNNs: Hidden State and Sequence Unrolling
- The Vanishing Gradient Problem in Deep Time Steps
- LSTM Cell: Input, Forget, and Output Gates
- Sequence-to-One: Sentiment Analysis with an LSTM