Skip to content

Potentially optimize dot4{I,U}8Packed on Metal #7653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

robamler
Copy link
Contributor

@robamler robamler commented May 1, 2025

Connections

Description
On Metal >= 2.1, emit code for dot4I8Packed and dot4U8Packed that might be easier to optimize for the compiler.

Testing

  • Includes snapshot tests for both Metal < 2.1 (no change) and >= 2.1 (new code gets emitted).

Squash or Rebase?

Needs squashing.

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

This might allow the Metal compiler to emit faster code (but that's not
confirmed). See
<gpuweb/gpuweb#2677 (comment)>
for the optimization. The limitation to Metal 2.1+ is discussed here:
<gfx-rs#7574 (comment)>.
@robamler robamler force-pushed the packed-vector-format-metal branch from a6825c8 to 8813b38 Compare May 1, 2025 10:53
@robamler

This comment was marked as resolved.

CI on test failed because the latest changes to `put_block` made its
stack too big. Factoring out the new code into a separate method fixes
this issue.
@robamler robamler force-pushed the packed-vector-format-metal branch from be668ce to 2a41b95 Compare May 1, 2025 22:12
@robamler
Copy link
Contributor Author

robamler commented May 1, 2025

I fixed the issue of an excessive stack size of put_block by factoring out the new code into a separate method. This PR will now need squashing before being merged.

The previously failing CI test can be run on a non-mac machine as follows:

cd naga
cargo test --features msl-out test_stack_size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant