738. Small Gradients and Slow Training
medium

A practitioner notices the training loss decreases but very slowly after many epochs. The gradient norms are consistently small but nonzero. What is the most likely cause?