feat: parallel scan extension for CPU #17
+222
−35
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Accelerate first-order filter on CPU for small batch sizes.
This PR serves as a first step to gradually remove
numba
code and use the native cpp bindings by PyTorch. #10 could be merged after this PR.The empirical benchmark shows the extension is faster when batch size < 2.
I plan to keep the
main
for development purposes, so the version will always be*.dev
. I will make a separate branch for the stable version.