84. Chain Rule in Backpropagation
easy

In a neural network, the loss L is a function of the output ŷ, which is a function of the pre-activation z, which is a function of the weight w. How is dL/dw computed?