StackedML

738. Small Gradients and Slow Training

medium

A practitioner notices the training loss decreases but very slowly after many epochs. The gradient norms are consistently small but nonzero. What is the most likely cause?

← Back to Questions