Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Abolish get_available_num_host_mem_channels" #400

Merged
merged 1 commit into from
Dec 13, 2024

Conversation

joelsmithTT
Copy link
Contributor

Reverts commit 3210bd9 from #393.

Issue

Metal CI failure tracked in tenstorrent/tt-metal#15675

Description

The reverted commit removed logic that allowed applications to request more hugepages than available to UMD. Previously, UMD would issue a warning in such cases. However, this created a potential safety issue since applications had no visibility into partial hugepage allocation (e.g., requesting 4 pages but receiving only 2).

This situation could lead to:

  • Host software segfaults when accessing unmapped pages
  • More critically, device software could potentially corrupt host physical address space by writing to nonexistent pages

While the original change (making excessive hugepage requests a fatal error) improved safety, particularly in conjunction with IOMMU enablement, it caused failures in Metal CI tests that (possibly unintentionally?) request more hugepages than available. This revert is a temporary measure until the Metal CI tests can be updated.

List of the changes

  • Revert 3210bd9
  • Update comment documentation

Testing

CI

API Changes

There are no API changes in this PR.

@joelsmithTT joelsmithTT self-assigned this Dec 13, 2024
@joelsmithTT joelsmithTT requested a review from broskoTT December 13, 2024 02:44
@broskoTT broskoTT merged commit 7da6017 into main Dec 13, 2024
24 checks passed
@broskoTT broskoTT deleted the joelsmith/restore-broken-hugepage-behavior branch December 13, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants