ReLU and Its Leaky & GELU Cousins
The default activation and modern variants.
Meet ReLU
The most popular activation is ReLU: it keeps positive values and turns every negative one into zero. Simple and fast. ⚡
import torch.nn.functional as F
y = F.relu(x) # max(0, x), elementwiseWhy It Caught On
ReLU is cheap to compute and its gradient is a clean 1 for positives. That keeps signals flowing and makes deep nets train quickly.
All lessons in this course
- Why Nonlinearity Unlocks Real Power
- ReLU and Its Leaky & GELU Cousins
- Sigmoid & Tanh: Squashing to a Range
- Softmax for Probabilities