Concept
What is the attention mechanism?
Tap to reveal answer
Answer
Self-attention lets each token attend to all others by computing weighted sums of values based on query-key dot products. Enables capturing long-range dependencies regardless of distance. Multi-head attention runs this in parallel with different learned projections.