Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Let ttnn.transformer.scaled_dot_product_attention support dropout_p (dropout probability) #16022

Open
jdh8 opened this issue Dec 13, 2024 · 0 comments
Labels
feature-request External feature request

Comments

@jdh8
Copy link
Contributor

jdh8 commented Dec 13, 2024

Is your feature request related to a problem? Please describe.
I am trying to lower aten._scaled_dot_product_flash_attention to ttnn.transformer.scaled_dot_product_attention. The related issues are:

Describe the solution you'd like
Add a floating-point parameter dropout_p to ttnn.transformer.scaled_dot_product_attention. Its behavior should match torch.nn.functional.scaled_dot_product_attention.

Describe alternatives you've considered
Split out the final step of matrix multiplication, so we can insert a dropout op there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request External feature request
Projects
Status: No status
Development

No branches or pull requests

1 participant