Answer
Computes gradients of the loss w.r.t. each weight using the chain rule, propagating errors backward through the network. For each layer: ∂L/∂W = ∂L/∂output × ∂output/∂W. Weights updated via gradient descent: W -= lr × ∂L/∂W. Core of all neural network training.