Skip to content

GPU (CUDA) accelerated filters using 2D convolution for high resolution images.

License

Notifications You must be signed in to change notification settings

tgautam03/xFilters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

xFilters

Convolution is a popular array operation used in signal processing, digital recording, image/video processing, and computer vision. This repository provides 2D convolution algorithm written from scratch in C++ (for CPU) and CUDA C++ (for GPU), which can be used to apply filters to high resolution images.

Tested on NVIDIA RTX 3090 using Ubuntu 24.04.1 LTS with nvidia-driver-560 and CUDA 12.6.

Images are first converted to grayscale, and then the filter is applied.

Table of contents

  1. Naive 2D convolution on a CPU.
  2. Naive 2D convolution on a GPU.
  3. 2D convolution on a GPU using constant memory for filter matrix.
  4. 2D convolution on a GPU using constant memory for filter matrix and tiling for shared memory usage.
  5. Naive 2D convolution on a GPU (using pinned memory).
  6. 2D convolution on a GPU using constant memory for filter matrix (using pinned memory).
  7. 2D convolution on a GPU using constant memory for filter matrix and tiling for shared memory usage (using pinned memory).

Example Run

CPU/GPU Filter

  1. In the terminal run: make filters_cpu or make filters_gpu

  2. You will be asked to enter the location of the image. For example, data/8k.jpg.

  3. You will be asked to type the filter name. Supported filters are as follows:

    Supported Filters

    Sharpen

    High-pass (edge detection)

    Low-pass

    Gaussian (image blurring)

    Derivative of Gaussian (edge detection)

Benchmarks

Runtime Overview (time in seconds)

CPU GPU (Naive) GPU (Constant Memory) GPU (Constant Memory + Tiling) GPU (Pinned Memory) GPU (Constant + Pinned Memory) GPU (Constant + Pinned Memory + tiling)
Allocating Memory --- 0.00044032 0.000191488 0.000313344 0.000217088 0.000176064 0.000154464
Moving input to Memory --- 0.0028009 0.00271984 0.00283443 0.00265677 0.00267555 0.0026567
Moving filter to Memory --- 8.736e-06 0.000128704 0.0002504 9.632e-06 0.000199776 0.000105152
Kernel execution 0.0607285 5.2029e-05 5.16403e-05 5.53062e-05 4.50765e-05 4.3735e-05 5.37395e-05
Moving output to Memory --- 0.00601299 0.00601722 0.0065999 0.00249299 0.00250381 0.0024945
Total 0.0607285 0.00931497 0.00910889 0.0100534 0.00542156 0.00559894 0.00546456

Naive CPU

make 00_cpu_conv2d_benchmark.out 
Loaded image with Width: 2048 and Height: 1328

Applying filter... 
Time for kernel execution (seconds): 0.0607285

--------------------- 
Benchmarking details: 
--------------------- 
FPS (total): 16.4667
GFLOPS (kernel): 1.2432
------------------------------------ 

Naive GPU

make 01_gpu_conv2d_benchmark.out
Loaded image with Width: 2048 and Height: 1328

Allocating GPU memory... 
Time for GPU memory allocation (seconds): 0.00044032

Moving input to GPU memory... 
Time for input data transfer (seconds): 0.0028009

Moving filter to GPU memory... 
Time for filter data transfer (seconds): 8.736e-06

Applying filter... 
Time for kernel execution (seconds): 5.20294e-05

Moving result to CPU memory... 
Time for output data transfer (seconds): 0.00601299

--------------------- 
Benchmarking details: 
--------------------- 
Time (total): 0.00931497
FPS (total): 107.354

Time (kernel): 5.20294e-05
FPS (kernel): 19219.9
GFLOPS (kernel): 1451.05
------------------------------------ 

GPU using constant memory

make 02_gpu_conv2d_constMem_benchmark.out
Loaded image with Width: 2048 and Height: 1328

Allocating GPU memory... 
Time for GPU memory allocation (seconds): 0.000191488

Moving input to GPU memory... 
Time for input data transfer (seconds): 0.00271984

Moving filter to GPU memory... 
Time for filter data transfer (seconds): 0.000128704

Applying filter... 
Time for kernel execution (seconds): 5.16403e-05

Moving result to CPU memory... 
Time for output data transfer (seconds): 0.00601722

--------------------- 
Benchmarking details: 
--------------------- 
Time (total): 0.00910889
FPS (total): 109.783

Time (kernel): 5.16403e-05
FPS (kernel): 19364.7
GFLOPS (kernel): 1461.99
------------------------------------ 

GPU using constant memory and tiling

make 03_gpu_conv2d_tiled_benchmark.out 
Loaded image with Width: 2048 and Height: 1328

Allocating GPU memory... 
Time for GPU memory allocation (seconds): 0.000313344

Moving input to GPU memory... 
Time for input data transfer (seconds): 0.00283443

Moving filter to GPU memory... 
Time for filter data transfer (seconds): 0.0002504

Applying filter... 
Time for kernel execution (seconds): 5.53062e-05

Moving result to CPU memory... 
Time for output data transfer (seconds): 0.0065999

--------------------- 
Benchmarking details: 
--------------------- 
Time (total): 0.0100534
FPS (total): 99.469

Time (kernel): 5.53062e-05
FPS (kernel): 18081.1
GFLOPS (kernel): 1365.08
------------------------------------ 

Naive GPU (pinned memory)

make 04_gpu_conv2d_pinnedMem_benchmark.out
Loaded image with Width: 2048 and Height: 1328

Allocating GPU memory... 
Time for GPU memory allocation (seconds): 0.000217088

Moving input to GPU memory... 
Time for input data transfer (seconds): 0.00265677

Moving filter to GPU memory... 
Time for filter data transfer (seconds): 9.632e-06

Applying filter... 
Time for kernel execution (seconds): 4.50765e-05

Moving result to CPU memory... 
Time for output data transfer (seconds): 0.00249299

--------------------- 
Benchmarking details: 
--------------------- 
Time (total): 0.00542156
FPS (total): 184.449

Time (kernel): 4.50765e-05
FPS (kernel): 22184.5
GFLOPS (kernel): 1674.88
------------------------------------ 

GPU using constant memory (pinned memory)

make 05_gpu_conv2d_pinnedConstMem_benchmark.out 
Loaded image with Width: 2048 and Height: 1328

Allocating GPU memory... 
Time for GPU memory allocation (seconds): 0.000176064

Moving input to GPU memory... 
Time for input data transfer (seconds): 0.00267555

Moving filter to GPU memory... 
Time for filter data transfer (seconds): 0.000199776

Applying filter... 
Time for kernel execution (seconds): 4.3735e-05

Moving result to CPU memory... 
Time for output data transfer (seconds): 0.00250381

--------------------- 
Benchmarking details: 
--------------------- 
Time (total): 0.00559894
FPS (total): 178.605

Time (kernel): 4.3735e-05
FPS (kernel): 22865
GFLOPS (kernel): 1726.25
------------------------------------ 

GPU using constant memory and tiling (pinned memory)

make 06_gpu_conv2d_pinnedTiled_benchmark.out
Loaded image with Width: 2048 and Height: 1328

Allocating GPU memory... 
Time for GPU memory allocation (seconds): 0.000154464

Moving input to GPU memory... 
Time for input data transfer (seconds): 0.0026567

Moving filter to GPU memory... 
Time for filter data transfer (seconds): 0.000105152

Applying filter... 
Time for kernel execution (seconds): 5.37395e-05

Moving result to CPU memory... 
Time for output data transfer (seconds): 0.0024945

--------------------- 
Benchmarking details: 
--------------------- 
Time (total): 0.00546456
FPS (total): 182.997

Time (kernel): 5.37395e-05
FPS (kernel): 18608.3
GFLOPS (kernel): 1404.88
------------------------------------ 

References