742. Sparse Gradient Signal
easy
A model has many parameters but only a few receive meaningful gradients during training. What is the likely consequence?
A model has many parameters but only a few receive meaningful gradients during training. What is the likely consequence?