338. Gradient Flow and Layer Learning
easy

Gradient flow through a network determines which layers learn during training. In early deep networks, why did only the last few layers learn effectively?