-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move getConsumerPosAlignedToProducerCA to MaxPosCalculator #3766
Conversation
PR Reviewer Guide 🔍(Review updated until commit 612ab0f)Here are some key observations to aid the review process:
|
!test --diff |
// Try to find the aligned position on consumer's domain corresponding to a | ||
// position of producer domain. No checking on actual | ||
// producer-consumer relationship. | ||
int64_t getConsumerPosAlignedToProducerCA( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is moved to MaxPosCalculator
, except for one difference noted below.
IdModel id_model({consumer->definition()}, {}, false); | ||
id_model.buildBroadcastGraph(); | ||
const auto& inlining_graph = id_model.idGraph(IdMappingMode::BROADCAST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is removed when moved to MaxPosCalculator
. We need an IdModel to determine valid inlining positions. Previously, it created a per-tensor IdModel here. Since it's now moved to MaxPosCalculator
, it doesn't need to create a new IdModel as long as the same instance of MaxPosCalculator
is reused.
Suggested titleMove Description
Changes walkthrough 📝
|
!test --diff |
!test --diff |
Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
Moved
getConsumerPosAlignedToProducerCA
toMaxPosCalculator
to reuse anIdModel
built for the entire fusion.No output should be affected, but there are two main reasons of the change, one is efficiency and another is functionality.
TensorView::updateMaxProducerPosition
created a (small) IdModel, which is inefficient since in common casesupdateMaxProducerPosition
is called from routines that already holds an IdModel throughMaxPosCalculator
. Moving the function toMaxPosCalculator
can reuse the same IdModel.Fusion::registerExactMapping
. For example:In this case, there are exact groups of
{i0, i1, i3, i5, i7}
,{i2, i7, i8}
and{b4}
. We could set the loop domain oft2
look liket4
byscheduleLoopDomainsBy({t2}, t4->getLoopDomain())
, which would clonei8
and set the loop domain oft2
as{i3, i9}
, wherei9
is a clone ofi8
and is registered as exact mapped withi8
. This should all be fine as long as we looked at the whole fusion, but if an IdModel was built only fort3 = t1 + t2
, the loop domain oft2
,{i3, i9}
, wouldn't be grouped together with the loop domain oft3
,{i5, i7}
. It would discover the mapping ofi3
andi5
, but it wouldn't know anything betweeni9
andi7
. The registered exact mapping wouldn't help because it's registered with a pair ofi8
andi9
. It is obvious thati8
is mapped withi7
, only when the IdModel analysis also looks at thet4
definition. When an IdModel is created only fort3 = t1 + t2
, the registered mapping would do nothing. This problem can be avoided by using a whole-fusion IdModel.