60. BERT vs GPT Attention Direction
hard

BERT uses bidirectional attention while GPT uses unidirectional (causal) attention. What task does each architecture suit best?