Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert aten.sum.dim_IntList to ttnn.sum #264

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Convert aten.sum.dim_IntList to ttnn.sum #264

wants to merge 15 commits into from

Conversation

jdh8
Copy link
Contributor

@jdh8 jdh8 commented Sep 25, 2024

Ticket

Problem description

To support aten.sum.dim_IntList, we need ttnn.sum to support a tuple of dims. I'm investigating how to patch the kernel.

What's changed

@jdh8 jdh8 force-pushed the feature/sum branch 2 times, most recently from 680876a to 79ffb55 Compare November 8, 2024 18:00
@jdh8 jdh8 self-assigned this Nov 9, 2024
@jdh8 jdh8 added the conversion label Nov 9, 2024
@jdh8 jdh8 marked this pull request as ready for review November 9, 2024 09:05
@jdh8 jdh8 added this pull request to the merge queue Nov 9, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 9, 2024
@jdh8
Copy link
Contributor Author

jdh8 commented Nov 9, 2024

I have no idea about the failing test case.

@jdh8 jdh8 added the help wanted Extra attention is needed label Nov 9, 2024
((1024, 640), (0,)),
((14, 2048), (0,)),
((14, 512), (0,)),
((16384, 128), (0,)),
Copy link
Contributor Author

@jdh8 jdh8 Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found mild PCC error locally for (16384, 128), (0,):

expected_pytorch_result = tensor([[  53.5000,  -96.0000, -113.5000, -139.0000,  -72.5000,   39.2500,
         -159.0000,  -13.6875,  -87.0000,  ...187.0000, -130.0000,  -23.7500,  -18.7500,  -25.2500, -132.0000,
          -22.0000,   87.5000]], dtype=torch.bfloat16)
actual_pytorch_result = TorchTensor([[  63.0000, -111.5000, -129.0000, -142.0000,  -74.0000,   59.2500,
              -153.0000,  -14.1875,  -...000, -137.0000,   -5.2188,  -18.7500,  -34.0000, -153.0000,
               -15.6250,  102.5000]], dtype=torch.bfloat16)
pcc = 0.999

    def assert_with_pcc(expected_pytorch_result, actual_pytorch_result, pcc=0.999):
        assert list(expected_pytorch_result.shape) == list(
            actual_pytorch_result.shape
        ), f"list(expected_pytorch_result.shape)={list(expected_pytorch_result.shape)} vs list(actual_pytorch_result.shape)={list(actual_pytorch_result.shape)}"
        pcc_passed, pcc_message = comp_pcc(expected_pytorch_result, actual_pytorch_result, pcc)
>       assert pcc_passed, construct_pcc_assert_message(pcc_message, expected_pytorch_result, actual_pytorch_result)
E       AssertionError: 0.9923741703805607

tests/utils.py:219: AssertionError

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PCC accuracy declines as tensors grow in size because of aggregation of differences. Would it make sense to lower the threshold for tensors where a dimension is over 10k?

((14, 2048), (0,)),
((14, 512), (0,)),
((16384, 128), (0,)),
((16384, 32), (0,)),
Copy link
Contributor Author

@jdh8 jdh8 Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found mild PCC error locally for (16384, 32), (0,):

expected_pytorch_result = tensor([[-156.0000, -105.0000, -136.0000,   71.0000,  -59.5000, -163.0000,
          -26.1250,  -26.2500,  -67.0000,  ... 10.8750, -237.0000, -108.0000,  -38.2500, -107.5000,   47.2500,
          -41.2500,  -85.0000]], dtype=torch.bfloat16)
actual_pytorch_result = TorchTensor([[-185.0000, -127.5000, -161.0000,   93.5000,  -63.2500, -199.0000,
               -40.0000,  -44.2500,  -...000, -280.0000, -116.5000,  -36.0000, -147.0000,   76.0000,
               -37.7500, -104.0000]], dtype=torch.bfloat16)
pcc = 0.999

    def assert_with_pcc(expected_pytorch_result, actual_pytorch_result, pcc=0.999):
        assert list(expected_pytorch_result.shape) == list(
            actual_pytorch_result.shape
        ), f"list(expected_pytorch_result.shape)={list(expected_pytorch_result.shape)} vs list(actual_pytorch_result.shape)={list(actual_pytorch_result.shape)}"
        pcc_passed, pcc_message = comp_pcc(expected_pytorch_result, actual_pytorch_result, pcc)
>       assert pcc_passed, construct_pcc_assert_message(pcc_message, expected_pytorch_result, actual_pytorch_result)
E       AssertionError: 0.993693118840562

tests/utils.py:219: AssertionError

tensor = g.call_function(ttnn.to_layout, (tensor, TtnnRowMajorLayout()))
tensor = g.call_function(ttnn.unsqueeze, (tensor, 0))
tensor = g.call_function(ttnn.sum, (tensor, 1))
tensor = g.call_function(ttnn.squeeze, (tensor, 0))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summing along dimension 0 seems unsupported, so I made this workaround

@jdh8 jdh8 added this pull request to the merge queue Nov 12, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 12, 2024
jdh8 added 11 commits December 27, 2024 19:43
Our current use cases only reduce at most 2 dimensions, so I'm using this naive
algorithm.  If the loop becomes a bottleneck, consider the following algorithm
producing O(1) ops:

1. Transpose the tensor to group together dimensions to reduce
2. Reshape the tensor to merge dimensions to reduce
3. Reduce the only dimension
4. Unsqueeze/reshape if keepdims=True
…eezing at arbitrary axis is unsupported yet
I doubt if this makes a difference in Python though
@jdh8 jdh8 enabled auto-merge December 27, 2024 20:08
@jdh8 jdh8 added this pull request to the merge queue Dec 27, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 27, 2024
@jdh8 jdh8 removed the help wanted Extra attention is needed label Dec 27, 2024
@jdh8 jdh8 added this pull request to the merge queue Dec 27, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 27, 2024
@jdh8 jdh8 added this pull request to the merge queue Dec 27, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aten.sum.dim_IntList
3 participants