84. Chain Rule in Backpropagation
easy
In a neural network, the loss L is a function of the output ŷ, which is a function of the pre-activation z, which is a function of the weight w. How is dL/dw computed?
In a neural network, the loss L is a function of the output ŷ, which is a function of the pre-activation z, which is a function of the weight w. How is dL/dw computed?