333. Gradient Descent in High-Dimensional Loss Surfaces
medium
Why does gradient descent in a high-dimensional space almost always find a useful solution despite the non-convexity of deep network loss surfaces?
Why does gradient descent in a high-dimensional space almost always find a useful solution despite the non-convexity of deep network loss surfaces?