Evaluation of fast (and eventually distributed) vector processing for dot products. A bench of 2048 element 32-bit floating point vectors is processed against a single query vector.
Different operation types are implemented for evaluation:
- plain, naive for-loop
- 8-fold unrolled for-loop
- hand-tuned AVX optimized for-loop
- hand-tuned SSE4.2 optimized for-loop
- OpenMP SIMD optimized for-loop
These options configure the CMake build. To enable an option OPTION
,
set -DOPTION=ON
; to disable, use -DOPTION=OFF
.
To build in release mode with link-time optimizations enabled, call e.g.
cmake -DCMAKE_BUILD_TYPE=Release \
-DFSTM_ENABLE_LTO=ON \
..
Option | Default | Description |
---|---|---|
FSTM_ENABLE_LTO | ON | Enables link-time optimization (if available). |
FSTM_ENABLE_CCACHE | ON | Enables ccache support when building (if available). |
Option | Default | Description |
---|---|---|
FSTM_WITH_PROFILER | OFF | Builds with performance profiler support. |
FSTM_WITH_TCMALLOC | ON | Builds with tcmalloc support. |
Option | Default | Description |
---|---|---|
FSTM_WITH_FAST_MATH | OFF | Enables fast math optimizations (if available). |
FSTM_WITH_OPENMP | ON | Enables OpenMP support (if available). |
FSTM_WITH_SIMD_AVX2 | OFF | Builds with AVX2 support |
FSTM_WITH_SIMD_AVX | OFF | Builds with AVX support |
FSTM_WITH_SIMD_SSE42 | ON | Builds with SSE 4.2 support |
Option | Default | Description |
---|---|---|
FSTM_BUILD_TESTS | ON | Builds unit tests. |
Unit tests are provided by means of googletest.
To enable CTest testing functionality and building of the unit tests project,
configure CMake with the -DFSTM_BUILD_TESTS=1
option,
e.g. using
mkdir build && cd build
cmake -DFSTM_BUILD_TEST=1 ..
make test
Support for Gperftools is available in the project. In order to profile using its CPU Profiler, enable support in CMake, then run the program with the following environment variables set:
CPUPROFILE=/tmp/firestorm.prof
CPUPROFILE_FREQUENCY=1000
This will create the file specified in the CPUPROFILE
containing
the sampling information. To display the profiling data, call
pprof --web firestorm firestorm.prof
Or, if kcachegrind
is available:
pprof --callgrind firestorm firestorm.prof > firestorm.callgrind
kcachegrind firestorm.callgrind
If the file is empty, the application likely didn't exit normally. More interesting information can be found here.
In order to profile the call graph with Valgrind and KCachegrind, run the application using
valgrind --tool=callgrind ./firestorm
Which will create an output file, e.g. callgrind.out.18360
.
You can then visualize the results using that file with
kcachegrind callgrind.out.18360
Read here for further information.