Prevents exploding gradients by scaling down large gradient vectors. Standard technique in RNN and Transformer training.
1. Compute L2 norm: `||g|| = sqrt(Σg_i²)`
2. If `||g|| > max_norm`: scale gradient by `max_norm / ||g||`
gradient_clip([3.0,4.0], max_norm=2.0) # ||g|| = 5.0, scale = 2/5 = 0.4 → [1.2, 1.6]
Round to **5 decimal places**.
Test Cases (2 visible · 1 hidden)
Case 1: Classic 3-4-5 triangle
Input: gradient_clip([3.0,4.0],2.0)
Expected: [1.2, 1.6]
Case 2: Norm < max, no clipping
Input: gradient_clip([0.5,0.5],2.0)
Expected: [0.5, 0.5]
⌘↵ Run · ⌘⇧↵ Submit