StackedML

84. Chain Rule in Backpropagation

easy

In a neural network, the loss L is a function of the output ŷ, which is a function of the pre-activation z, which is a function of the weight w. How is dL/dw computed?

← Back to Questions