[runtime] enable runtime stitching #1329

pilkicTT · 2025-02-26T09:31:50Z

The tt-mlir runtime now leaves every tensor on device and can accept inputs already on device. Use that to avoid unnecessary movements of tensors to and from the device.

For example, when running inference of a model in a loop, we are currently sending the weights/constant tensors on each iteration. This is obviosly not ideal.

Another example would be in training scenario. We can now reuse outputs of, let's say, forward() program on device as inputs to the backward() program. Without needing to move outputs of forward() to the host and then running the backward() program which will move them back to the device.

The text was updated successfully, but these errors were encountered:

pilkicTT added this to the [FFE] Runtime v1 milestone Feb 26, 2025

pilkicTT added the runtime label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[runtime] enable runtime stitching #1329

[runtime] enable runtime stitching #1329

pilkicTT commented Feb 26, 2025 •

edited

Loading

[runtime] enable runtime stitching #1329

[runtime] enable runtime stitching #1329

Comments

pilkicTT commented Feb 26, 2025 • edited Loading

pilkicTT commented Feb 26, 2025 •

edited

Loading