StackedML
Practice
Labs
Questions
Models
Pricing
Sign in
Questions
/
Optimization
/
Gradient Methods
/
Adam / adaptive optimizers
← Previous
Next →
266.
Epsilon in Adam
easy
The default hyperparameters for Adam are β₁=0.9, β₂=0.999, ε=1e-8. What does the ε term prevent?
A
Overfitting by regularizing the effective learning rate toward zero for parameters with small gradients
B
Gradient explosion by capping the second moment estimate at a maximum value during training
C
Division by zero when the second moment estimate is very small, ensuring numerical stability
D
Gradient vanishing by adding a constant floor to all gradient magnitudes during optimization
Sign in to verify your answer
← Back to Questions