An adaptive optimizer like Adam adjusts the learning rate per parameter. What information does it use to do this?