StackedML
Practice
Labs
Questions
Models
Pricing
Sign in
Questions
/
Optimization
/
Gradient Methods
/
Saddle points
← Previous
Next →
719.
SGD and Saddle Point Escape
medium
How does SGD noise help escape saddle points compared to batch gradient descent?
A
SGD's adaptive learning rate automatically increases near saddle points to overcome the zero gradient condition
B
SGD's noisy gradient estimates introduce perturbations that can push parameters along descent directions at saddle points
C
SGD's smaller batch size increases the gradient magnitude near saddle points, accelerating escape from flat regions
D
SGD's momentum term accumulates velocity across iterations allowing it to roll through saddle points without stopping
Sign in to verify your answer
← Back to Questions