New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Record min/max of integral tensor in ET #191

Closed

shengfukevin wants to merge 1 commit into facebookresearch:main from shengfukevin:export-D66666931

Contributor

shengfukevin commented Dec 12, 2024

Summary:
X-link: pytorch/pytorch#143088

In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Differential Revision: D66666931

shengfukevin requested review from kingchc, louisfeng, sunghlin, shengbao-zheng and briancoutinho as code owners

December 12, 2024 01:32

facebook-github-bot added the CLA Signed label

Contributor

facebook-github-bot commented Dec 12, 2024

This pull request was exported from Phabricator. Differential Revision: D66666931

facebook-github-bot added the fb-exported label

shengfukevin added a commit to shengfukevin/pytorch that referenced this pull request


          Record min/max of integral tensor in ET (pytorch#143088)

79ff728

Summary:
X-link: facebookresearch/param#191


In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Test Plan: buck2 run mode/opt caffe2/test:test_profiler_cuda -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_record_integral_tensor_range_cuda

Differential Revision: D66666931

shengfukevin added a commit to shengfukevin/param that referenced this pull request


          Record min/max of integral tensor in ET (facebookresearch#191)

72608fd

Summary:

X-link: pytorch/pytorch#143088

In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Differential Revision: D66666931

shengfukevin force-pushed the export-D66666931 branch from bfdf35a to 72608fd Compare

December 12, 2024 02:40

Contributor

facebook-github-bot commented Dec 12, 2024

This pull request was exported from Phabricator. Differential Revision: D66666931

shengfukevin added a commit to shengfukevin/param that referenced this pull request


          Record min/max of integral tensor in ET (facebookresearch#191)

0cbaa12

Summary:

X-link: pytorch/pytorch#143088

In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Differential Revision: D66666931

shengfukevin force-pushed the export-D66666931 branch from 72608fd to 0cbaa12 Compare

December 12, 2024 02:53

shengfukevin added a commit to shengfukevin/pytorch that referenced this pull request


          Record min/max of integral tensor in ET (pytorch#143088)

5ee1db9

Summary:
X-link: facebookresearch/param#191


In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Test Plan: buck2 run mode/opt caffe2/test:test_profiler_cuda -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_record_integral_tensor_range_cuda

Differential Revision: D66666931

Contributor

facebook-github-bot commented Dec 12, 2024

This pull request was exported from Phabricator. Differential Revision: D66666931

shengfukevin added a commit to shengfukevin/param that referenced this pull request


          Record min/max of integral tensor in ET (facebookresearch#191)

819fbd1

Summary:

X-link: pytorch/pytorch#143088

In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Differential Revision: D66666931

shengfukevin force-pushed the export-D66666931 branch from 0cbaa12 to 819fbd1 Compare

December 12, 2024 03:10

shengfukevin added a commit to shengfukevin/pytorch that referenced this pull request


          Record min/max of integral tensor in ET (pytorch#143088)

aa60273

Summary:
X-link: facebookresearch/param#191


In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Test Plan: buck2 run mode/opt caffe2/test:test_profiler_cuda -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_record_integral_tensor_range_cuda

Differential Revision: D66666931

shengfukevin added a commit to shengfukevin/pytorch that referenced this pull request


          Record min/max of integral tensor in ET (pytorch#143088)

393323e

Summary:
X-link: facebookresearch/param#191


In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Test Plan: buck2 run mode/opt caffe2/test:test_profiler_cuda -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_record_integral_tensor_range_cuda

Differential Revision: D66666931

Contributor

facebook-github-bot commented Dec 12, 2024

This pull request was exported from Phabricator. Differential Revision: D66666931

shengfukevin added a commit to shengfukevin/param that referenced this pull request


          Record min/max of integral tensor in ET (facebookresearch#191)

a072dad

Summary:

X-link: pytorch/pytorch#143088

In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Differential Revision: D66666931

shengfukevin force-pushed the export-D66666931 branch from 819fbd1 to a072dad Compare

December 12, 2024 03:11

Contributor

facebook-github-bot commented Dec 12, 2024

This pull request was exported from Phabricator. Differential Revision: D66666931

shengfukevin added a commit to shengfukevin/param that referenced this pull request


          Record min/max of integral tensor in ET (facebookresearch#191)

8c1fbb7

Summary:

X-link: pytorch/pytorch#143088

In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Differential Revision: D66666931

shengfukevin force-pushed the export-D66666931 branch from a072dad to 8c1fbb7 Compare

December 12, 2024 06:59

shengfukevin added a commit to shengfukevin/pytorch that referenced this pull request


          Record min/max of integral tensor in ET (pytorch#143088)

a1657ab

Summary:
X-link: facebookresearch/param#191


In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Test Plan: buck2 run mode/opt caffe2/test:test_profiler_cuda -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_record_integral_tensor_range_cuda

Differential Revision: D66666931

Contributor

facebook-github-bot commented Dec 12, 2024

This pull request was exported from Phabricator. Differential Revision: D66666931

shengfukevin added a commit to shengfukevin/pytorch that referenced this pull request


          Record min/max of integral tensor in ET (pytorch#143088)

afafe79

Summary:
X-link: facebookresearch/param#191


In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Test Plan: buck2 run mode/opt caffe2/test:test_profiler_cuda -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_record_integral_tensor_range_cuda

Differential Revision: D66666931

shengfukevin added a commit to shengfukevin/param that referenced this pull request


          Record min/max of integral tensor in ET (facebookresearch#191)

56a381f

Summary:

X-link: pytorch/pytorch#143088

In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Differential Revision: D66666931

shengfukevin force-pushed the export-D66666931 branch from 8c1fbb7 to 56a381f Compare

December 12, 2024 07:04

Contributor

facebook-github-bot commented Dec 12, 2024

This pull request was exported from Phabricator. Differential Revision: D66666931

shengfukevin added a commit to shengfukevin/param that referenced this pull request


          Record min/max of integral tensor in ET (facebookresearch#191)

6d1a49f

Summary:

X-link: pytorch/pytorch#143088

In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Differential Revision: D66666931

shengfukevin force-pushed the export-D66666931 branch from 56a381f to 6d1a49f Compare

December 13, 2024 00:37

shengfukevin added a commit to shengfukevin/pytorch that referenced this pull request


          Record min/max of integral tensor in ET (pytorch#143088)

82807e4

Summary:
X-link: facebookresearch/param#191


In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Test Plan: buck2 run mode/opt caffe2/test:test_profiler_cuda -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_record_integral_tensor_range_cuda

Differential Revision: D66666931

Contributor

facebook-github-bot commented Dec 13, 2024

This pull request was exported from Phabricator. Differential Revision: D66666931

shengfukevin added a commit to shengfukevin/param that referenced this pull request


          Record min/max of integral tensor in ET (facebookresearch#191)

e8355e4

Summary:

X-link: pytorch/pytorch#143088

In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Differential Revision: D66666931

shengfukevin force-pushed the export-D66666931 branch from 6d1a49f to e8355e4 Compare

December 13, 2024 19:09

shengfukevin added a commit to shengfukevin/pytorch that referenced this pull request


          Record min/max of integral tensor in ET (pytorch#143088)

Summary:
X-link: facebookresearch/param#191


In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Test Plan: buck2 run mode/opt caffe2/test:test_profiler_cuda -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_record_integral_tensor_range_cuda

Differential Revision: D66666931

Contributor

facebook-github-bot commented Dec 13, 2024

This pull request was exported from Phabricator. Differential Revision: D66666931

sanrise approved these changes

View reviewed changes

shengfukevin added a commit to shengfukevin/param that referenced this pull request


          Record min/max of integral tensor in ET (facebookresearch#191)

01ad990

Summary:

X-link: pytorch/pytorch#143088

In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Reviewed By: sanrise

Differential Revision: D66666931

shengfukevin force-pushed the export-D66666931 branch from e8355e4 to 01ad990 Compare

December 17, 2024 20:32

shengfukevin added a commit to shengfukevin/pytorch that referenced this pull request


          Record min/max of integral tensor in ET (pytorch#143088)

dd200fe

Summary:
X-link: facebookresearch/param#191


In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Test Plan: buck2 run mode/opt caffe2/test:test_profiler_cuda -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_record_integral_tensor_range_cuda

Reviewed By: sanrise

Differential Revision: D66666931

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D66666931


          Record min/max of integral tensor in ET (facebookresearch#191)

0272d9f

Summary:

X-link: pytorch/pytorch#143088

In et-replay, random data is used to run the operators. However, it does not work well for the op that uses index to access tensor. For example, embedding ops, which use the indices to look up the embedding table. If random data is used for these index ops, et-replay usually runs into invalid memory access issue.

To fix it, ET provides an environment variable "ENABLE_PYTORCH_EXECUTION_TRACE_INTEGRAL_TENSOR_RANGE", if it is set, ET will capture the min/max value of the flattened integral tensor. Then in et_replay, the min/max is used to generate the random tensor within that range. It fixed invalid memory access issue.

Reviewed By: sanrise

Differential Revision: D66666931

shengfukevin force-pushed the export-D66666931 branch from 01ad990 to 0272d9f Compare

December 17, 2024 23:16

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D66666931

facebook-github-bot closed this in

827ac1f

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Dec 18, 2024

This pull request has been merged in 827ac1f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

sanrise sanrise approved these changes

kingchc Awaiting requested review from kingchc kingchc is a code owner

louisfeng Awaiting requested review from louisfeng louisfeng is a code owner

sunghlin Awaiting requested review from sunghlin sunghlin is a code owner

shengbao-zheng Awaiting requested review from shengbao-zheng shengbao-zheng is a code owner

briancoutinho Awaiting requested review from briancoutinho briancoutinho is a code owner

Labels

CLA Signed fb-exported Merged