StackedML
Practice
Labs
Questions
Models
Pricing
Sign in
Questions
/
Deep Learning
/
Architectures (Conceptual)
/
Transformers (high-level intuition)
← Previous
Next →
30.
Attention Mechanism Computation
easy
What does the attention mechanism compute for each token in a transformer?
A
A weighted sum of position embeddings scaled by the token's embedding magnitude at the current position in the sequence representation
B
A learned gating signal that determines how much of the previous hidden state to retain at the current step
C
A weighted sum of value vectors from all positions, where weights reflect how relevant each position is to the current one
D
A normalized dot product between the current token and a learned query vector representing the target class
Sign in to verify your answer
← Back to Questions