Flash attention (v1) implementation

Paper: https://arxiv.org/abs/2205.14135

Benchmark:

I compare the Flash Attention implementation with a pure pytorch implementation of the attention algorithm

- Pure torch attention CUDA time total: 64.838ms
- Flash Attention CUDA time total: 9.864ms

Validity of imlementation:

The CUDA implementation is compared with a correct, pure pytorch implementation and a official pytorch implementation of the attention mechanism in main.py. The tests show that the Flash Attention implementation is correct.

Sample results:

All close test: True
Average value in torch-attention: 0.0013573728501796722
Average value in scaled_dot_prod-attention: 0.0013573728501796722
Average value in flash-attention: 0.0013573728501796722

Run tests: python3 -m test.main

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
test		test
README.md		README.md
bench.py		bench.py
modules.py		modules.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flash attention (v1) implementation

Benchmark:

I compare the Flash Attention implementation with a pure pytorch implementation of the attention algorithm

Validity of imlementation:

About

Releases

Packages

Languages

ulrikisdahl/Flash-Attention

Folders and files

Latest commit

History

Repository files navigation

Flash attention (v1) implementation

Benchmark:

I compare the Flash Attention implementation with a pure pytorch implementation of the attention algorithm

Validity of imlementation:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages