Live Engine
Select Topic
easyOptimizers
You train a neural network using SGD with learning rate 0.1 and observe very noisy loss curves — the loss zigzags up and down across consecutive batches. Switching to SGD with Momentum (β=0.9) smooths the curve. What mathematical operation does momentum perform that causes this smoothing?