493. Masked Self-Attention in Decoder
medium

The transformer decoder uses masked self-attention during training. Why is masking necessary?