StackedML
Practice
Labs
Questions
Models
Pricing
Sign in
Questions
/
Deep Learning
/
Foundations
/
Vanishing / exploding gradients
← Previous
Next →
6.
Activation Function for Vanishing Gradients
easy
Which activation function most directly addresses the vanishing gradient problem in deep networks?
A
Sigmoid, because it compresses gradients into a stable range that prevents exponential shrinkage during backpropagation
B
ReLU, because it has a gradient of exactly 1 for positive inputs preventing gradient attenuation through active neurons
C
Softmax, because it normalizes gradients across all output neurons to prevent any single path from dominating
D
Tanh, because it is zero-centered and produces larger gradients than sigmoid at most input values
Sign in to verify your answer
← Back to Questions