Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typos in big-bird.md #2392

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions big-bird.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,11 +179,11 @@ Random attention is ensuring that each query token will attend a few random toke

```python
# r1, r2, r are some random indices; Note: r1, r2, r3 are different for each row 👇
Q[1] x [Q[r1], Q[r2], ......, Q[r]]
Q[1] x [K[r1], K[r2], ......, K[r]]
.
.
.
Q[n-2] x [Q[r1], Q[r2], ......, Q[r]]
Q[n-2] x [K[r1], K[r2], ......, K[r]]

# leaving 0th & (n-1)th token since they are already global
```
Expand All @@ -209,7 +209,7 @@ Attention score for \\(\mathbf{q}_{1}\\) represented by \\(a_1\\) where \\(a_1=S

---

For calculating attention score for tokens in seconcd block, we are gathering the first three blocks, the last block, and the fifth block. Then we can compute \\(a_2 = Softmax(q_2 * concat(k_1, k_2, k_3, k_5, k_7)\\).
For calculating attention score for tokens in second block, we are gathering the first three blocks, the last block, and the fifth block. Then we can compute \\(a_2 = Softmax(q_2 * concat(k_1, k_2, k_3, k_5, k_7)\\).

![BigBird block sparse attention](assets/18_big_bird/q2.png)

Expand Down