D2M Pass 4: Tensor Layout #2205

jdesousa-TT · 2025-02-18T18:46:39Z

Ticket

#1908

Problem description

This is a complex problem. The tensor layout pass will evolve over time to make better and more efficient decisions.

What's changed

This initial iteration of the pass will do four things:

Rewrite the function return to add a ~~reinterpret_layout~~ to_layout from whatever layout the final op produces, to the expected output of the original function. Insert an empty tensor op to hold this result, and rewrite the function return op.
Rewrite the outputs of each ttir generic op to use the optimal tensor layout. The initial heuristic is as follows:
- Maximize the grid shape that the tensor will use
- Minimize the padding needed to achieve this size
- Never allow for entire cores to be working on padding alone
This section will also rewrite the worker grid attribute of the generic to match the output grid.
Rewrite the function operands to their optimal layouts. This does not change the function operands themselves, it only changes the uses of these operands to be references to ~~reinterpret_layout~~ to_layouts and allocate the empty tensor ops required to store this transformation.
Rewrite generic block memrefs to match the new tensor layouts.

Checklist

New/Existing tests provide coverage for changes

nsmithtt

It's off to a good start! Some comments inline

lib/Conversion/TTIRToTTMetal/TensorLayout.cpp

nsmithtt · 2025-02-19T02:03:21Z

lib/Conversion/TTIRToTTMetal/TensorLayout.cpp

+    auto dpsOp = mlir::cast<DestinationStyleOpInterface>(op);
+    assert(dpsOp.getNumDpsInits() == 1 &&
+           "Only one result tensor is supported for now");
+    dpsOp.getDpsInits()[0].setType(optimal_layout);


I don't think it's legal to use set* operations inside of a pattern rewriter. I think all graph modifications must happen inside of PatternRewriter so these need to be wrapped in rewriter.modifyOpInPlace

Wrapped all uses of the set* operations inside of modifyOpInPlace callbacks.

nsmithtt · 2025-02-19T02:07:48Z

lib/Conversion/TTIRToTTMetal/TensorLayout.cpp

+  auto result_encoding =
+      mlir::dyn_cast_or_null<MetalLayoutAttr>(result_type.getEncoding());
+  assert(result_encoding && "Tensor type must have a MetalLayoutAttr encoding");
+  auto optimal_output_grid = getOptimalGrid(


Probably we need to assert that this layout's grid shape is all 1's since we're assuming here that the shard shape is the fully expanded physical/tiled shape.

I think the new check on line 82 should take care of this. We fail the rewrite if the grid size isn't 1x1.

nsmithtt · 2025-02-19T02:10:07Z

lib/Conversion/TTIRToTTMetal/TensorLayout.cpp

+        mlir::cast<MetalLayoutAttr>(optimal_layout.getEncoding()).getGrid());
+
+    return failure(); // need some better way to exit cond. the rewriter than
+                      // always returning false!


Return success if you want to commit your changes and then recursively run the pattern. Return failure if you don't want to commit (also signals to the rewrite driver that no update occur'd so no recursion needed on behalf of this invocation).

Typically you need to test at the top to see if your change has already been applied, esp in this case where you're not changing the op type so this pattern will be matched for every recursive step.

Updated the success/failure logic on most of the passes. Please take another look.

nsmithtt · 2025-02-19T02:13:41Z

lib/Conversion/TTIRToTTMetal/TensorLayout.cpp

+    }
+    if (grid_shape.size() == i + 1) {
+      continue;
+    }


I think let's not worry about pad for now, we don't have a way of implementing that kind of gather pattern yet and we have more fundamental things to get first. Let's just pick the first largest grid that divides memref dim[I]

I'm not sure if this interferes with our current padding support at all. In the case that the grid evenly divides the tensor, it will always pick that largest divisor. The padding here just safeguards us for odd shapes, and will essentially "round up" the tensor size to fit into tile aligned memrefs that are divisible by the grid. If this isn't what we want at this stage can you tell me a bit more about how we should handle odd tensor shapes for the interim?

I'm not sure if this interferes with our current padding support at all

I think this is what I'm getting at, we don't really have padding support anywhere else, so it's unclear if we're ready to lock in on this heuristic.

Nit we should use camelCase.

It might be written more succinctly:

for (size_t i = 0; i < memrefShape.size(); i++) { int64_t dim = memrefShape[i]; int64_t gridDim = deviceGridShape[i]; int64_t shardDim = llvm::divideCeil(dim, gridDim); gridShape.push_back(llvm::divideCeil(dim, shardDim)); }

Just to be clear, should I just assert out for shapes that are not tile aligned already, and then we can just divide the grid? Even if the tensor is tile aligned, take something like 19x1 tiles, should the grid shape just remain 1x1 because we can't pad out to a divisor of the grid?

Yeah we'll have to align up to tile for sure. Yeah I was thinking just picking even divisors for now, which I understand is unfortunate for primes > 8. Maybe it doesn't matter either way, I guess it's fine if we do it this way for now, I think I didn't fully understand how the pad was working here at first, but I think we should be flexible to change it if we need to in the future.

…ut passes

nsmithtt · 2025-02-25T02:09:02Z

lib/Conversion/TTIRToTTMetal/TensorLayout.cpp

+    }
+    if (grid_shape.size() == i + 1) {
+      continue;
+    }


I'm not sure if this interferes with our current padding support at all

I think this is what I'm getting at, we don't really have padding support anywhere else, so it's unclear if we're ready to lock in on this heuristic.

Nit we should use camelCase.

It might be written more succinctly:

for (size_t i = 0; i < memrefShape.size(); i++) { int64_t dim = memrefShape[i]; int64_t gridDim = deviceGridShape[i]; int64_t shardDim = llvm::divideCeil(dim, gridDim); gridShape.push_back(llvm::divideCeil(dim, shardDim)); }

nsmithtt · 2025-02-25T02:13:59Z

lib/Conversion/TTIRToTTMetal/TensorLayout.cpp

+    assert(op->getResults().size() == 1 &&
+           "Only one result tensor is supported for now");
+    auto optimal_layout = getLocalLayout(op->getResult(0), rewriter, device);
+    if (genericOp.getGridAttr() != GridAttr::get(rewriter.getContext()) ||


Is this condition ever true if the latter condition isn't?

nsmithtt · 2025-02-25T02:16:48Z

lib/Conversion/TTIRToTTMetal/TensorLayout.cpp

+      mlir::dyn_cast_or_null<MetalLayoutAttr>(result_type.getEncoding());
+  assert(result_encoding && "Tensor type must have a MetalLayoutAttr encoding");
+  auto optimal_output_grid = getOptimalGrid(
+      tensor.getContext(), result_encoding.getMemref().getShape(),


I think result_encoding.getMemref().getShape() might be clearer as llvm::divideCeil(resultEncoding.getPhysicalShape(...), 32).

jdesousa-TT force-pushed the jdesousa/ttir-tensor-layout branch 2 times, most recently from fe34ef2 to d0fd266 Compare February 19, 2025 01:37

nsmithtt reviewed Feb 19, 2025

View reviewed changes

jdesousa-TT force-pushed the jdesousa/ttir-tensor-layout branch 4 times, most recently from f8b952c to e6f0d4b Compare February 24, 2025 18:00

jdesousa-TT added 12 commits February 24, 2025 18:08

wip

b161fb2

wip

541b5c6

wip

f1c9264

wip

a5f3147

wip

b94330b

pre-commit fix

ee6c4ed

Add tensor layout reduction test

af1515f

cleanup

a44e67c

Group initial 3 tenor layout passes

f861afa

Use modifyOpInPlace for MemrefRewriter

2c70b77

Use modifyOpInPlace for GenericTensorLayout and FuncReturnTesnsorLayo…

16e4063

…ut passes

Update applyPatternsAndFoldGreedily after llvm uplift

654de15

jdesousa-TT force-pushed the jdesousa/ttir-tensor-layout branch from e6f0d4b to 654de15 Compare February 24, 2025 19:04

nsmithtt reviewed Feb 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

D2M Pass 4: Tensor Layout #2205

D2M Pass 4: Tensor Layout #2205

jdesousa-TT commented Feb 18, 2025 •

edited

Loading

nsmithtt left a comment

nsmithtt Feb 19, 2025

jdesousa-TT Feb 21, 2025

nsmithtt Feb 19, 2025

jdesousa-TT Feb 24, 2025

nsmithtt Feb 19, 2025 •

edited

Loading

jdesousa-TT Feb 21, 2025

nsmithtt Feb 19, 2025

jdesousa-TT Feb 21, 2025

nsmithtt Feb 25, 2025

jdesousa-TT Feb 25, 2025

nsmithtt Feb 25, 2025

nsmithtt Feb 25, 2025

nsmithtt Feb 25, 2025

nsmithtt Feb 25, 2025

D2M Pass 4: Tensor Layout #2205

Are you sure you want to change the base?

D2M Pass 4: Tensor Layout #2205

Conversation

jdesousa-TT commented Feb 18, 2025 • edited Loading

Ticket

Problem description

What's changed

Checklist

nsmithtt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nsmithtt Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdesousa-TT commented Feb 18, 2025 •

edited

Loading

nsmithtt Feb 19, 2025 •

edited

Loading