BERT uses bidirectional attention while GPT uses unidirectional (causal) attention. What task does each architecture suit best?