Skip to content

Improved the efficacy of K-means clustering via OpenMP, MPI, and combining both of them.

Notifications You must be signed in to change notification settings

DW1209/Accelerating-K-Means-Clustering-with-Parallel-Implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

K-Means Clustering Based on OMP, MPI, and Hybrid

Description

generate.py

NUMS MAXIMUM FILENAME
default 10000 1000000 data.txt
usage: generate.py [-h] [-n NUMS] [-m MAXIMUM] [-f FILENAME]

Randomly generate 2d coordinates and store in the inputs directory.

optional arguments:
  -h, --help                        show this help message and exit
  -n NUMS, --nums NUMS              generate <NUMS> points
  -m MAXIMUM, --maximum MAXIMUM     set the 2d coordinate to range between 0 and <MAXIMUM>
  -f FILENAME, --filename FILENAME  store the data in the inputs directory and named <FILENAME>

kmeans

CLUSTERS FILENAME THREADS OUTPUT
default 3 data.txt 4 true
usage: ./kmeans [-h] [-c CLUSTERS] [-f FILENAME] [-t THREADS] [-n] [--] cmd

optional arguments:
  -h --help                     show this help message and exit
  -c --clusters <CLUSTERS>      classify the data into <CLUSTERS> groups
  -f --filename <FILENAME>      <FILENAME> in the inputs directory
  -t --threads  <THREADS>       specify the number of omp threads
  -n --no-output                disable writing the final result to the outputs directory
  --                            sperate the arguments for kmeans and for the command
  cmd                           only "serial", "omp", "mpi", and "hybrid" are available

draw.py

CLUSTERS FILENAME
default 3 data.txt
usage: draw.py [-h] [-c CLUSTERS] [-f FILENAME]

Draw the scatterplots before and after K-Means Clustering.

optional arguments:
  -h, --help                        show this help message and exit
  -c CLUSTERS, --clusters CLUSTERS  classify the data into <CLUSTERS> groups
  -f FILENAME, --filename FILENAME  <FILENAME> from the inputs directory

Execution

Basic

Install Python required packages and generate the executable file kmeans.

$ pip3 install -r requirements.txt; make 

Randomly generate 2d coordinates and store them into the inputs directory.

$ python3 generate.py [-n NUMS] [-m MAXIMUM] [-f FILENAME]

Do K-Means Clustering.

# commandline for serial and omp method
$ ./kmeans [-c CLUSTERS] [-f FILENAME] [-t THREADS] [-n] [--] cmd 

# commandline for mpi method
$ mpirun -np <PROCESSES> --hostfile <HOSTFILE> --bind-to <TYPE> \
  ./kmeans [-c CLUSTERS] [-f FILENAME] [-t THREADS] [-n] mpi

# commandline for hybrid method
$ mpirun -np <PROCESSES> -x OMP_NUM_THREADS=<THREADS> --hostfile <HOSTFILE> --bind-to <TYPE> \
  ./kmeans [-c CLUSTERS] [-f FILENAME] [-t THREADS] [-n] hybrid

Draw the scatterplots before and after K-Means Clustering if you would like to see the result.

$ python3 draw.py [-c CLUSTERS] [-f FILENAME]

Advanced

An command line example for OMP method

  • -c cluster = 5
  • -f input filename = data.txt (default)
  • -t OMP thread = 4 (default)
  • --no-output = true
$ ./kmeans -c 5 --no-output omp

An command line example for MPI method

  • -np total number of MPI processes = 4
  • --hostfile provide a hostfile to use
  • --bind-to core bind processes to cores
$ mpirun -np 4 --hostfile <HOSTFILE> --bind-to core ./kmeans --no-output mpi

An command line example for Hybrid method

  • OMP_PROC_BIND support binding of threads
  • OMP_NUM_THREADS specify the number of omp threads = 4
  • -np total number of MPI processes = 4
  • -pernode one process per node
  • --hostfile provide a hostfile to use (should have at least 4 hosts)
  • --bind-to none bind processes to none
$ export OMP_PROC_BIND=true; export OMP_NUM_THREADS=4;
$ mpirun -np 4 -pernode --hostfile <HOSTFILE> --bind-to none ./kmeans --no-output hybrid

References

About

Improved the efficacy of K-means clustering via OpenMP, MPI, and combining both of them.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •