-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KernelIntrinsics #562
base: vc/pocl
Are you sure you want to change the base?
KernelIntrinsics #562
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## vc/pocl #562 +/- ##
=======================================
Coverage 0.00% 0.00%
=======================================
Files 21 21
Lines 1509 1519 +10
=======================================
- Misses 1509 1519 +10 ☔ View full report in Codecov by Sentry. |
|
||
Returns the unique local work-item ID. | ||
""" | ||
function get_local_id end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So IIUC, backends should implement these like below, right?
function get_local_id()
return (threadIdx().x, threadIdx().y, threadIdx().z)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah basically, and my goal is to replace the old internal functions the people had to override with definitions based on these functions.
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
2d12b29
to
1298a56
Compare
1298a56
to
1831242
Compare
1831242
to
63244eb
Compare
63244eb
to
af5c7db
Compare
af5c7db
to
ad4c968
Compare
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
c51072f
to
a0e822e
Compare
ad4c968
to
1c8b7e9
Compare
1c8b7e9
to
55dde0c
Compare
55dde0c
to
f1e98cc
Compare
function get_global_size end | ||
|
||
""" | ||
get_global_id()::@NamedTuple{x::Int32, y::Int32, z::Int32} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Int32 or Int64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCL defines these as Csize_t
So the idea is to decouple the back-ends from KA.jl, instead implementing KernelIntrinsics.jl? What's the advantage; do you envision packages other than KA.jl to build their kernel DSL on top of KernelIntrinsics.jl? |
f1e98cc
to
6841207
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions could not be made:
- src/pocl/nanoOpenCL.jl
- lines 670-674
6841207
to
6a3e192
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions could not be made:
- src/pocl/nanoOpenCL.jl
- lines 670-674
I am very unsure if I am able to pull off #558 and as such, I want to remove the implicit My goal is to allow for a gracefull transition of the DSL macro language to something more similar to "just" OpenCL. This would allow us to write performance critical kernels directly and solve the issue of how to write kernels that use barriers correctly. If I can figure out #558 the macro based DSL can stick around; otherwise I will encourage folks to move their kernel to |
6a3e192
to
87af609
Compare
87af609
to
aec6c0a
Compare
aec6c0a
to
e91b33b
Compare
Will KA/KI still be a greatest common denominator of the GPU backends, or are you looking to introduce optional intrinsics? How will the groupreduce API do in terms of portability? |
The intrinsics proposed here are the greatest common denominator. I could see us adding some more intrinsics for reductions, but that is TBD. |
48e3752
to
e565304
Compare
e91b33b
to
7a5e159
Compare
KernelAbstractions currently creates kernels that look like: ``` if __validindex(ctx) # Body end ``` This is problematic due to the convergence requirement on `@synchronize`.
7a5e159
to
0a8301c
Compare
The goal is to allow for kernels to be written without relying on KernelAbstractions macros
cc: @maleadt @pxl-th