738. Small Gradients and Slow Training
medium
A practitioner notices the training loss decreases but very slowly after many epochs. The gradient norms are consistently small but nonzero. What is the most likely cause?
A practitioner notices the training loss decreases but very slowly after many epochs. The gradient norms are consistently small but nonzero. What is the most likely cause?