Rnn Lstm Gru | Deep Learning

Live Engine

Select Topic

easyRnn Lstm Gru

A vanilla RNN processes a sequence of 100 words and must produce a single classification output. You observe that the gradient norm at step 1 is 10⁻¹⁵ while the gradient norm at step 100 is ~1.0. What is this problem, and what mathematical property causes it?

Live Engine

Select Topic

easyRnn Lstm Gru

A vanilla RNN processes a sequence of 100 words and must produce a single classification output. You observe that the gradient norm at step 1 is 10⁻¹⁵ while the gradient norm at step 100 is ~1.0. What is this problem, and what mathematical property causes it?