Live Engine
Select Topic
easyRnn Lstm Gru
A vanilla RNN processes a sequence of 100 words and must produce a single classification output. You observe that the gradient norm at step 1 is 10⁻¹⁵ while the gradient norm at step 100 is ~1.0. What is this problem, and what mathematical property causes it?