huggingface · shahrukhx01 · Oct 3, 2024
diff --git a/big-bird.md b/big-bird.md
@@ -179,11 +179,11 @@ Random attention is ensuring that each query token will attend a few random toke
 
 ```python
 # r1, r2, r are some random indices; Note: r1, r2, r3 are different for each row 👇
-Q[1] x [Q[r1], Q[r2], ......, Q[r]]
+Q[1] x [K[r1], K[r2], ......, K[r]]
 .
 .
 .
-Q[n-2] x [Q[r1], Q[r2], ......, Q[r]]
+Q[n-2] x [K[r1], K[r2], ......, K[r]]
 
 # leaving 0th & (n-1)th token since they are already global
 ```
@@ -209,7 +209,7 @@ Attention score for \\(\mathbf{q}_{1}\\) represented by \\(a_1\\) where \\(a_1=S
 
 ---
 
-For calculating attention score for tokens in seconcd block, we are gathering the first three blocks, the last block, and the fifth block. Then we can compute \\(a_2 = Softmax(q_2 * concat(k_1, k_2, k_3, k_5, k_7)\\).
+For calculating attention score for tokens in second block, we are gathering the first three blocks, the last block, and the fifth block. Then we can compute \\(a_2 = Softmax(q_2 * concat(k_1, k_2, k_3, k_5, k_7)\\).
 
 ![BigBird block sparse attention](assets/18_big_bird/q2.png)