Attention Scaling (Easy) | AI Code Lab

Attention ScalingEasy

00:00

Python idle

Attention Scaling

Compute the scaled dot-product score between a query and key vector — the fundamental operation before softmax in attention.

score(Q, K) = Q · K / √d_k

Where d_k = length of Q (and K).

Without scaling, dot products grow large in high dimensions → softmax becomes peaky → vanishing gradients.

attention_score([1,0,1,0],[0,1,0,1]) → 0.0
attention_score([1,1],[1,1]) → 1.41421

Round to **5 decimal places**.