New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Run performance test non-alternately #2394

Closed

HDCharles wants to merge 4 commits into main from 088_torchbench_torchao_updates

Contributor

HDCharles commented Jul 25, 2024

Summary:
By default, performance tests (speedup experiments) will run the baseline and test backend alternately.

However, this does not work for the torchao backend, which will change the model in-place, therefore the baseline run will also run with torchao backend since the model has already been quantized.

Add a new experiment "latency_experiment" to run performance tests non-alternately (first run baseline for a few iterations, then run the test backend).

other changes:

need to add torch.compiler.cudagraph_mark_step_begin() to avoid the
slowdown from # Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards

also updated the torchao APIs to the current versions

Test Plan:

python run_benchmark.py torchao --only AlbertForMaskedLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only BartForCausalLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only timm_efficientnet --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune

(should all be ~1.0
0.997x
1.006x
0.994x

Reviewers:

Subscribers:

Tasks:

Tags:

facebook-github-bot added the cla signed label

HDCharles temporarily deployed to docker-s3-upload

July 25, 2024 20:02

— with

GitHub Actions Inactive

HDCharles temporarily deployed to docker-s3-upload

July 25, 2024 20:02

— with

GitHub Actions Inactive

Contributor

facebook-github-bot commented Jul 25, 2024

@HDCharles has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

HDCharles requested a review from xuzhao9

July 26, 2024 19:14

xuzhao9 approved these changes

View reviewed changes

Contributor

xuzhao9 commented Jul 26, 2024

I am wondering why we have the message "Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards" from PyTorch, this is an inference test, so we should not have any backward pass...

HDCharles mentioned this pull request

Run performance test non-alternately pytorch/pytorch#131935

Closed

HDCharles pushed a commit to HDCharles/pytorch that referenced this pull request


          Run performance test non-alternately (pytorch#131935)

c9a70d1

Summary:
Pull Request resolved: pytorch#131935

By default, performance tests (speedup experiments) will run the baseline and test backend alternately.

However, this does not work for the torchao backend, which will change the model in-place, therefore the baseline run will also run with torchao backend since the model has already been quantized.

Add a new experiment "latency_experiment" to run performance tests non-alternately (first run baseline for a few iterations, then run the test backend).

other changes:

need to add torch.compiler.cudagraph_mark_step_begin() to avoid the
slowdown from             # Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards

also updated the torchao APIs to the current versions

X-link: pytorch/benchmark#2394

Test Plan:
python run_benchmark.py torchao --only AlbertForMaskedLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only BartForCausalLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only timm_efficientnet --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune

(should all be ~1.0
0.997x
1.006x
0.994x

Reviewed By: xuzhao9

Differential Revision: D60252821

Pulled By: HDCharles

HDCharles temporarily deployed to docker-s3-upload

July 31, 2024 02:17

— with

GitHub Actions Inactive

HDCharles had a problem deploying to docker-s3-upload

July 31, 2024 02:17

— with

GitHub Actions Failure

Contributor

facebook-github-bot commented Jul 31, 2024

@HDCharles has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pytorch-bot bot pushed a commit to pytorch/pytorch that referenced this pull request


          Run performance test non-alternately

f672ef5

Summary:
By default, performance tests (speedup experiments) will run the baseline and test backend alternately.

However, this does not work for the torchao backend, which will change the model in-place, therefore the baseline run will also run with torchao backend since the model has already been quantized.

Add a new experiment "latency_experiment" to run performance tests non-alternately (first run baseline for a few iterations, then run the test backend).

other changes:

need to add torch.compiler.cudagraph_mark_step_begin() to avoid the
slowdown from             # Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards

also updated the torchao APIs to the current versions

X-link: pytorch/benchmark#2394

Test Plan:
python run_benchmark.py torchao --only AlbertForMaskedLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only BartForCausalLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only timm_efficientnet --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune

(should all be ~1.0
0.997x
1.006x
0.994x

Reviewed By: xuzhao9

Differential Revision: D60252821

Pulled By: HDCharles

HDCharles force-pushed the 088_torchbench_torchao_updates branch from 44ab948 to b7844e7 Compare

July 31, 2024 19:13

HDCharles temporarily deployed to docker-s3-upload

July 31, 2024 19:13

— with

GitHub Actions Inactive

HDCharles had a problem deploying to docker-s3-upload

July 31, 2024 19:13

— with

GitHub Actions Failure

Contributor

facebook-github-bot commented Jul 31, 2024

@HDCharles has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

HDCharles added a commit to HDCharles/pytorch that referenced this pull request


          Run performance test non-alternately

76ae0ff

Summary:
By default, performance tests (speedup experiments) will run the baseline and test backend alternately.

However, this does not work for the torchao backend, which will change the model in-place, therefore the baseline run will also run with torchao backend since the model has already been quantized.

Add a new experiment "latency_experiment" to run performance tests non-alternately (first run baseline for a few iterations, then run the test backend).

other changes:

need to add torch.compiler.cudagraph_mark_step_begin() to avoid the
slowdown from             # Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards

also updated the torchao APIs to the current versions

X-link: pytorch/benchmark#2394

Test Plan:
python run_benchmark.py torchao --only AlbertForMaskedLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only BartForCausalLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only timm_efficientnet --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune

(should all be ~1.0
0.997x
1.006x
0.994x

Reviewed By: xuzhao9

Differential Revision: D60252821

Pulled By: HDCharles

xuzhao9 and others added 3 commits

August 2, 2024 13:18


          Run performance test non-alternately

ac1a6cd

Summary:
By default, performance tests (speedup experiments) will run the baseline and test backend alternately.

However, this does not work for the torchao backend, which will change the model in-place, therefore the baseline run will also run with torchao backend since the model has already been quantized.

Add a new experiment "latency_experiment" to run performance tests non-alternately (first run baseline for a few iterations, then run the test backend).

other changes:

need to add torch.compiler.cudagraph_mark_step_begin() to avoid the
slowdown from             # Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards

also updated the torchao APIs to the current versions

Test Plan:

python run_benchmark.py torchao --only AlbertForMaskedLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune
python run_benchmark.py torchao --only BartForCausalLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune
python run_benchmark.py torchao --only timm_efficientnet --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune

(should all be ~1.0
0.997x
1.006x
0.994x

Reviewers:

Subscribers:

Tasks:

Tags:


          Linting

50db228

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:


          lint fixes

4ee2463

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

HDCharles force-pushed the 088_torchbench_torchao_updates branch from b7844e7 to 4ee2463 Compare

August 2, 2024 20:18

HDCharles temporarily deployed to docker-s3-upload

August 2, 2024 20:18

— with

GitHub Actions Inactive

HDCharles temporarily deployed to docker-s3-upload

August 2, 2024 20:18

— with

GitHub Actions Inactive

Contributor

facebook-github-bot commented Aug 2, 2024

@HDCharles has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pytorch-bot bot pushed a commit to pytorch/pytorch that referenced this pull request


          Run performance test non-alternately

7c785d7

Summary:
By default, performance tests (speedup experiments) will run the baseline and test backend alternately.

However, this does not work for the torchao backend, which will change the model in-place, therefore the baseline run will also run with torchao backend since the model has already been quantized.

Add a new experiment "latency_experiment" to run performance tests non-alternately (first run baseline for a few iterations, then run the test backend).

other changes:

need to add torch.compiler.cudagraph_mark_step_begin() to avoid the
slowdown from             # Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards

also updated the torchao APIs to the current versions

X-link: pytorch/benchmark#2394

Test Plan:
python run_benchmark.py torchao --only AlbertForMaskedLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only BartForCausalLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only timm_efficientnet --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune

(should all be ~1.0
0.997x
1.006x
0.994x

Reviewed By: xuzhao9

Differential Revision: D60252821

Pulled By: HDCharles


          fix lint

4ca5a04

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

HDCharles temporarily deployed to docker-s3-upload

August 7, 2024 00:46

— with

GitHub Actions Inactive

HDCharles temporarily deployed to docker-s3-upload

August 7, 2024 00:46

— with

GitHub Actions Inactive

Contributor

facebook-github-bot commented Aug 7, 2024

@HDCharles has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pytorch-bot bot pushed a commit to pytorch/pytorch that referenced this pull request


          Run performance test non-alternately

e0a2c4a

Summary:
By default, performance tests (speedup experiments) will run the baseline and test backend alternately.

However, this does not work for the torchao backend, which will change the model in-place, therefore the baseline run will also run with torchao backend since the model has already been quantized.

Add a new experiment "latency_experiment" to run performance tests non-alternately (first run baseline for a few iterations, then run the test backend).

other changes:

need to add torch.compiler.cudagraph_mark_step_begin() to avoid the
slowdown from             # Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards

also updated the torchao APIs to the current versions

X-link: pytorch/benchmark#2394

Test Plan:
python run_benchmark.py torchao --only AlbertForMaskedLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only BartForCausalLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only timm_efficientnet --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune

(should all be ~1.0
0.997x
1.006x
0.994x

Reviewed By: xuzhao9

Differential Revision: D60252821

Pulled By: HDCharles

Contributor Author

HDCharles commented Aug 8, 2024

@pytorchmergebot merge

pytorch-bot bot commented Aug 8, 2024

Mergebot is not configured for this repository. Please use the merge button provided by GitHub.

pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request


          Run performance test non-alternately (#131935)

Summary:
By default, performance tests (speedup experiments) will run the baseline and test backend alternately.

However, this does not work for the torchao backend, which will change the model in-place, therefore the baseline run will also run with torchao backend since the model has already been quantized.

Add a new experiment "latency_experiment" to run performance tests non-alternately (first run baseline for a few iterations, then run the test backend).

other changes:

need to add torch.compiler.cudagraph_mark_step_begin() to avoid the
slowdown from             # Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards

also updated the torchao APIs to the current versions

X-link: pytorch/benchmark#2394

Test Plan:
python run_benchmark.py torchao --only AlbertForMaskedLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only BartForCausalLM --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune python run_benchmark.py torchao --only timm_efficientnet --quantization noquant --performance --inference --bfloat16 --inductor-compile-mode max-autotune

(should all be ~1.0
0.997x
1.006x
0.994x

Reviewed By: xuzhao9

Differential Revision: D60252821

Pulled By: HDCharles

Pull Request resolved: #131935
Approved by: https://github.com/xuzhao9

facebook-github-bot pushed a commit that referenced this pull request


          Run performance test non-alternately (#131935)

bf0e5a9

Summary:
By default, performance tests (speedup experiments) will run the baseline and test backend alternately.

However, this does not work for the torchao backend, which will change the model in-place, therefore the baseline run will also run with torchao backend since the model has already been quantized.

Add a new experiment "latency_experiment" to run performance tests non-alternately (first run baseline for a few iterations, then run the test backend).

other changes:

need to add torch.compiler.cudagraph_mark_step_begin() to avoid the
slowdown from             # Unable to hit fast path of CUDAGraphs because of pending, uninvoked backwards

also updated the torchao APIs to the current versions

X-link: #2394

Originally Reviewed By: xuzhao9

X-link: pytorch/pytorch#131935
Approved by: https://github.com/xuzhao9

Reviewed By: xuzhao9, PaliC

Differential Revision: D60252821

Pulled By: HDCharles

fbshipit-source-id: 08ad452c5fcb34182c9aa7da1fe761db9587de71

kit1980 mentioned this pull request

[torchao] Dashboard numbers are missing even when the workflow is succeeded #2415

Open

Contributor

xuzhao9 commented Aug 13, 2024

Merge is done by ShipIt: bf0e5a9

xuzhao9 closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels