Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ompGemm_m multiply kernel #327

Merged
merged 45 commits into from
May 22, 2024
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
19d26e8
Add ompGemm_m kernel parallelised over j in both min and max kernel. …
tkoskela Dec 19, 2023
04b83f1
Merge branch 'tk-ci-test-multiply-kernels' into tk-omp-experiments
tkoskela Dec 19, 2023
6c4fd5d
Add multiply kernel with parallelisation over j in both min and max k…
tkoskela Dec 19, 2023
0242279
Add kernel where we thread over both i and j with collapse(2)
tkoskela Dec 19, 2023
914166a
Change schedule to runtime for testing. Fix omp end bug
tkoskela Dec 19, 2023
b7822b8
Add OpenAcc multiply kernel for testing
tkoskela Jan 10, 2024
4d19c11
Merge branch 'develop' into tk-omp-experiments
tkoskela Jan 10, 2024
7cf301d
Copy A matrix in array syntax
tkoskela Jan 16, 2024
c94cf89
Do b copy using array syntax outside parallel loop
tkoskela Jan 16, 2024
2e65616
Remove commented out code
tkoskela Jan 16, 2024
8680442
Version of acc kernel that compiles but doesn't work.
tkoskela Jan 19, 2024
9930b94
Use allocatables instead of pointers
tkoskela Jan 19, 2024
06ed04c
Shorten loop to copy C
tkoskela Jan 25, 2024
41d45a6
Remove unnecessary initializations
tkoskela Jan 25, 2024
dfb30e6
Avoid loop carrier dependencies by precomputing indices
tkoskela Jan 25, 2024
90e9121
Copy back to C with a logical mask
tkoskela Jan 25, 2024
1f29e90
Use pointer to A instead of data copy
tkoskela Jan 30, 2024
d50b006
Use a pointer to B instead of data copy
tkoskela Jan 30, 2024
704d3c5
Clean up and reorder things
tkoskela Jan 30, 2024
1f5a3a7
Less indices. Still too many indices
tkoskela Jan 31, 2024
b07653d
Ignore more files in tests and benchmarks
tkoskela Jan 31, 2024
453ae68
Move index precomputation out of parallel loop
tkoskela Jan 31, 2024
d0f0db7
Add syste.make file for myriad
tkoskela Feb 2, 2024
ef65a38
Merge branch 'tk-add-system.make.myriad' into tk-optimize-multiply-ke…
tkoskela Feb 2, 2024
411bdbf
Add XC_COMPFLAGS to compiler with libxc v4
tkoskela Feb 5, 2024
1a94b96
Fix copy c loop. Add comments
tkoskela Feb 8, 2024
80c6b53
Use right multiply kernel on myriad
tkoskela Feb 8, 2024
2da61e5
WIP: refactor m_kern_min
tkoskela Feb 13, 2024
a076735
Test ompGemm_m in main workflow
tkoskela Feb 19, 2024
0018828
Refactor index computations out of m_kern_min and m_kern_max
tkoskela Feb 21, 2024
c94c274
Cleanup + array syntax copies of B and C
tkoskela Mar 1, 2024
6fe131f
Merge branch 'develop' into tk-optimize-multiply-kernel
tkoskela Mar 14, 2024
c35452e
remove extra kernel
tkoskela Mar 18, 2024
daff6b3
remove experimental kernels
tkoskela Mar 21, 2024
f534039
Clean up code and comments
tkoskela Mar 21, 2024
d4e6225
Merge branch 'develop' into tk-optimize-multiply-kernel
tkoskela Mar 21, 2024
0acdb7f
Merge branch 'tk-optimize-multiply-kernel' of github.com:OrderN/CONQU…
tkoskela Mar 21, 2024
1bea12a
Revert inefficient vectorized copies, add explaining comment
tkoskela Mar 21, 2024
6cecd58
Add missing variable declaration
tkoskela Mar 21, 2024
4fddbce
Rename myriad make file to new convention
tkoskela Mar 21, 2024
843db36
Remove barriers in multiply_module
tkoskela Apr 19, 2024
950f2f2
Update to myriad makefile
tkoskela Apr 23, 2024
744d635
Remove unused targets and commented out pointers
tkoskela May 3, 2024
4347802
Merge branch 'develop' into tk-optimize-multiply-kernel
tkoskela May 13, 2024
cdd509e
Revert back to loop-based implementation of c copy
tkoskela May 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/makefile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,6 @@ jobs:
multiply_kernel: ompDojk
- test_all_multiply_kernels: false
multiply_kernel: ompGemm
- test_all_multiply_kernels: false
multiply_kernel: ompGemm_m

steps:
- uses: actions/checkout@v3
Expand Down
8 changes: 8 additions & 0 deletions benchmarks/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
__pycache__
Conquest_o*
Conquest_w*
*.i00*
*.bib
*.dat
*.log
fort.*
2 changes: 1 addition & 1 deletion src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ initial_read_module.o:initial_read_module.f90 datestamp.o
# for GCC13, possibly only on Mac. It doesn't need any other
# compiler flags (libraries or communications) so should be OK like this
pseudo_tm_info.o:pseudo_tm_info.f90
$(FC) -c $<
$(FC) $(XC_COMPFLAGS) -c $<

#datestamp.f90: $(COMMENT)
# $(ECHOSTR) "module datestamp\n" > datestamp.f90
Expand Down
20 changes: 10 additions & 10 deletions src/matrix_module.f90
Original file line number Diff line number Diff line change
Expand Up @@ -121,16 +121,16 @@ module matrix_module
type matrix_halo
integer :: np_in_halo ! Partns in halo
integer :: ni_in_halo ! Atoms in halo
integer,pointer :: nh_part(:) ! No of halo atoms in halo part
integer,pointer :: j_beg(:) ! accumulator of nh_part
integer,pointer :: lab_hcell(:) ! sim cell part (CC) = halo part
integer,pointer :: lab_hcover(:) ! CS part (CC) = halo part
integer,pointer :: j_seq(:) ! Part seq of halo atom
integer,pointer :: i_h2d(:) ! halo->neigh trans for halo atom
integer,pointer :: i_halo(:) ! halo seq of atom in CS
integer,pointer :: i_hbeg(:) ! Where CS part starts in i_halo
integer,pointer :: ndimi(:)
integer,pointer :: ndimj(:)
integer,allocatable :: nh_part(:) ! No of halo atoms in halo part
integer,allocatable :: j_beg(:) ! accumulator of nh_part
integer,allocatable :: lab_hcell(:) ! sim cell part (CC) = halo part
integer,allocatable :: lab_hcover(:) ! CS part (CC) = halo part
integer,allocatable :: j_seq(:) ! Part seq of halo atom
integer,allocatable :: i_h2d(:) ! halo->neigh trans for halo atom
integer,allocatable :: i_halo(:) ! halo seq of atom in CS
integer,allocatable :: i_hbeg(:) ! Where CS part starts in i_halo
integer,allocatable :: ndimi(:)
integer,allocatable :: ndimj(:)
! This type's values for maxima
integer :: mx_part,mx_halo
end type matrix_halo
Expand Down
Loading
Loading