Skip to content

Commit

Permalink
Release v4.0.0
Browse files Browse the repository at this point in the history
support for batch adaptive randomized approximation and column pivoted QR decomposition
support for CUDA 10
support for MAGMA versions > 2.3
  • Loading branch information
stefanozampini committed Sep 8, 2020
1 parent 06e05a1 commit fc8ad5a
Show file tree
Hide file tree
Showing 109 changed files with 5,038 additions and 859 deletions.
184 changes: 0 additions & 184 deletions Jenkinsfile

This file was deleted.

12 changes: 8 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
.PHONY: all clean
.PHONY: all testing lib clean

all:
(cd src && make -j)
(cd testing && make -j)
all: lib

tests:
(cd testing && make -j$(KBLAS_MAKE_NP))

lib:
(cd src && make -j$(KBLAS_MAKE_NP))

clean:
rm -f -v ./lib/*.a
Expand Down
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ KBLAS provides highly optimized routines from various levels of BLAS and LAPACK,
6. Batch General: (⎐⇟ ⚭ ⚬= ✼) GESVJ, GERSVD, GEQRF.
7. Batch Tile low-rank GEMM (⎏ ⎐ ⇞ ⚬ =).
8. GPU-Resident POTRF kernel (⎐ ⇞ ⚬).
9. Batch Tall-and-Skinny QR (⇞ ⎐ ⚬ = | ✼) TSQR.
10. Batch Adaptive Randomized Approximation (⇞ ⎐ ⚬ |) ARA.
11. Batch column pivoted QR (⇞ ⎏ ⚬ = ✼) GEQP2.
12. Batch small pivoted Cholesky (⇞ ⎏ ⚬ = ✼) PSTRF.

⇟ Standard precisions: s/d/c/z.
⇞ Real precisions: s/d.
Expand All @@ -29,6 +33,7 @@ KBLAS provides highly optimized routines from various levels of BLAS and LAPACK,
⚬ Single-GPU support.
⚭ Multi-GPU support.
= Uniform batch sizes.
| Non-uniform batch sizes.
✼ Non-strided and strided variants.


Expand Down Expand Up @@ -64,10 +69,6 @@ To build KBLAS, please follow these instructions:

make

5. Build local documentation (optional)

make docs


Testing
=======
Expand Down Expand Up @@ -106,6 +107,7 @@ International Euro-Par Conference on Parallel and Distributed Computing*, 2013.
9. A. Abdelfattah, J. Dongarra, D. Keyes, and H. Ltaief, Optimizing Memory-Bound SyMV Kernel on GPU Hardware Accelerators, *10th
International Conference High Performance Computing for Computational Science - VECPAR*, DOI: http://dx.doi.org/10.1007/978-3-642-38718-0_10, 2012.

10. W. H. Boukaram, G. Turkiyyah, H. Ltaief, D. Keyes, Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression, *Parallel Computing*, 74:19-33, 2018.

Handout
=======
Expand Down
Loading

0 comments on commit fc8ad5a

Please sign in to comment.