FLASH (Fast LSH Algorithm for Similarity Search Accelerated with HPC) is a library for large scale approximate nearest neighbor search of sparse vectors. It is currently available in C++ for CPU parallel computing and supports OpenCL enabled GPGPU computing. See our paper for theoretical and benchmarking details.
The current version of the soft is tested on 64-bit machines running Ubuntu 16.04, with CPU and at least 1 GPGPU installed. The compiler needs to support C++11 and OpenMP.
An active installation of OpenCL 1.1 or OpenCL 2.0 is required to support the GPGPU capability. CPUs running OpenCL is supported but not recommended as the performance will be sub-optimal. OpenCL on Nvidia graphics cards requires the installation of CUDA.
Then install clinfo (using apt-get) to verify the number of OpenCL platforms and devices. These information will be used in the configurations.
Navigate to the FLASH directory. Configure the system by editing the following section of LSHReservoir_config.h guided by the comments:
// Comment out if not using GPU.
#define USE_GPU
// Comment out if using OpenCL 1.XX. Does not matter if not usng GPU.
#define OPENCL_2XX
#define CL_GPU_PLATFORM 0 // Does not matter if not usng GPU.
#define CL_CPU_PLATFORM 1
#define CL_GPU_DEVICE 0 // Does not matter if not usng GPU.
#define CL_CPU_DEVICE 0
Fill in CL_GPU_PLATFORM / CL_GPU_PLATFORM according to the order that the platforms appear in the output of clinfo. If multiple devices exist, fill in CL_GPU_DEVICE / CL_CPU_DEVICE to choose the desired device according to their order in the output of clinfo.
Save and close the file. Type in terminal:
make
The compilation is complete if no errors appear.
The current example code in the main() function verifies one of the results presented in our paper on the Webspam dataset. Download the dataset from libsvm, and rename the libsvm-format file as trigram.svm. Download the groundtruths from this link. Place the dataset and groundtruth files in the FLASH directory. Run the program from the terminal:
./runme
The test program builds multiple hash tables for the dataset and query 10,000 test vectors followed by quality evaluations. The program will run with console outputs, indicating the progress and performance. The parameters, such as L, K and R can be edited in main.cpp. A re-compilation is required after changing the parameters. Please note that the time for parsing the dataset from disk might take about 5-10 minutes.
A basic documentation of the API generated by doxygen is available as doc.pdf in the working directory.
- Yiqiu Wang designed and implemented the CPU and GPU FLASH.
- Anshumali Shrivastava invented and is author of the DOPH hash codes. He is the main contributor to the theoretical aspect of our work.
- Heejung Ryu actively participated in the process and contributed to the testing and comparison of FLASH with similar works.
This project is licensed under Apache License. See LICENSE for more details.
- Rice Universith Sketching and Hashing Lab (RUSH Lab) provided the computing platform for testing.