add test configurations for quantization with onnxruntime, awq, bnb (#95) #144

aliabdelkader · 2024-03-04T14:33:26Z

add test configurations for onnxruntime backend
add no_weights sweep to quantization tests
add test configuration for pytorch awq quantization
add test configuration for bnb quantization
add autoawq and bitsandbytes libraries to optional install requirements
update makefile and github workflow so that the autoawq & bnb get installed and their related tests run successfully locally and on CI

- add test configurations for onnxruntime backend - add no_weights sweep to quantization tests - add test configuration for pytorch awq quantization - add test configuration for bnb quantization - add autoawq and bitsandbytes libraries to optional install requirements - update makefile and github workflow so that the autoawq & bnb get installed and their related tests run successfully locally and on CI

aliabdelkader · 2024-03-04T16:20:58Z

Hi @IlyasMoutawwakil

I intend to continue working on the remaining points of More tests #95. But, I thought it would be better if make a pull request for the quantization schemes first.
Would you mind approving the github workflows for this pull request ? I want to make sure that the added tests pass on CI.
I added autoawq and bitsandbytes to setup.py so that they can be installed by makefile and github workflow files similar to the other packages. I am not sure if that would work in all cases as autoawq and bitsandbytes installation depends on cuda version available.
I saw that in

optimum-benchmark/optimum_benchmark/backends/pytorch/config.py

Line 12 in 99c4ad8

QUANTIZATION_CONFIGS = {"bnb": {"llm_int8_threshold": 0.0}, "gptq": {}, "awq": {}}

, llm_int8_threshold is set to zero by default. I am not sure why. I set it to 6 in the test configurations.
the pytorch_awq_exllama example gave me an error when i tried to run it.

I think that in

optimum-benchmark/examples/pytorch_awq_exllama.py

Line 22 in 99c4ad8

quantization_config={"version": "exllama"},

, quantization_config is set incorrectly. In the test configuration cuda_inference_pytorch_awq_exllama, I am setting it differently by setting the version in exllama_config. I am not sure if the original example should run and if there is something I missed there.

Thank you in advance.

IlyasMoutawwakil · 2024-03-04T23:56:46Z

thanks a lot for the PR

yes agree
yeh ofc
for autoawq and bitsandbytes, let's not put them in setup and only install them as part of the workflow
that's for benchmark reproducibility, the bnb 8bits quantization scheme uses this threshold to decompose the matmul into two operations, one in int8 and one in fp16, using zero we guarantee that only the 8bit part is used which guarantees reproducibility (since we only care about latency, throughput, .. reproducibility)
that's an example of running awq on amd gpus using exllama kernels, using a branch that's still unmerged Exllama kernels support for AWQ models transformers#28634, don't worry about it.

- remove autoawq bitsandbytes from setupy.py - add autoawq bitsandbytes to github workflows and makefie

IlyasMoutawwakil

thank you very much! very nice test configurations, that's exactly what I had in mind 😊
I left some comments

.github/workflows/test_cli_rocm_pytorch.yaml

Makefile

tests/configs/cuda_inference_pytorch_awq_exllama.yaml

IlyasMoutawwakil · 2024-03-06T00:46:08Z

awesome ! all cuda tests ran successfully 🤗

aliabdelkader · 2024-03-06T12:57:09Z

Hi @IlyasMoutawwakil,

Thank you very much for your quick responses and support. I made a new commit addressing the review comments that you had.

I just want to raise again the point about installation of autoawq and bitsandBytes (point 3 from earlier). I have removed them from the setup.py as we discussed.

But, to be honest, I think the setup.py approach is more maintainable on the long run. As, it puts all the installation requirements in one place which will make it easier for users/developers to look for it in the future. Also, pinning the dependencies to a specific version would be easier if it is done in one place.

I will leave the decision up to you. if you would like me to revert back and add them again to the setup.py, please me know. Otherwise, in case you do not any further comments, I think this PR can be merged assuming it passes on the CI.

Thank you in advance.

IlyasMoutawwakil · 2024-03-06T16:12:36Z

thank you @aliabdelkader ! Yes feel free to add them but we will have to remove them at some point because package@git+... generally are hard to package for pypi. We also don't want setup.py to include every possibly installable third party lib, as is the case in transformers, raising an error informing the user of their missing lib makes more sense.

- update github workflow files to install autoawq and bnb using setup.py - "requests" is installed independently because "autoawq@git+https..." requires it to proceed with its installation.

IlyasMoutawwakil · 2024-03-08T16:30:37Z

@aliabdelkader all that is left is styling

aliabdelkader · 2024-03-08T19:45:24Z

yes, sorry I forgot to run the Linter earlier.

aliabdelkader added 2 commits March 5, 2024 15:55

set bnb llm_int8_threshold to 0 for reproducibility in tests

746532f

handle installation of autoawq bitsandbytes without setup.py

13cd173

- remove autoawq bitsandbytes from setupy.py - add autoawq bitsandbytes to github workflows and makefie

IlyasMoutawwakil reviewed Mar 5, 2024

View reviewed changes

.github/workflows/test_cli_rocm_pytorch.yaml Outdated Show resolved Hide resolved

Makefile Outdated Show resolved Hide resolved

tests/configs/cuda_inference_pytorch_awq_exllama.yaml Outdated Show resolved Hide resolved

minor fixes resolving PR review comments

5d256d6

add autoawq and bnb to setup.py

6d323e4

- update github workflow files to install autoawq and bnb using setup.py - "requests" is installed independently because "autoawq@git+https..." requires it to proceed with its installation.

minor fix to setup.py style

dcae350

IlyasMoutawwakil closed this Mar 13, 2024

IlyasMoutawwakil reopened this Mar 13, 2024

IlyasMoutawwakil merged commit cafe343 into huggingface:main Mar 13, 2024
33 of 44 checks passed

lopozz mentioned this pull request Mar 13, 2024

Added test llama-2-7b with GPTQ quant. scheme #141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add test configurations for quantization with onnxruntime, awq, bnb (#95) #144

add test configurations for quantization with onnxruntime, awq, bnb (#95) #144

aliabdelkader commented Mar 4, 2024

aliabdelkader commented Mar 4, 2024

IlyasMoutawwakil commented Mar 4, 2024

IlyasMoutawwakil left a comment

IlyasMoutawwakil commented Mar 6, 2024

aliabdelkader commented Mar 6, 2024 •

edited

Loading

IlyasMoutawwakil commented Mar 6, 2024

IlyasMoutawwakil commented Mar 8, 2024

aliabdelkader commented Mar 8, 2024

add test configurations for quantization with onnxruntime, awq, bnb (#95) #144

add test configurations for quantization with onnxruntime, awq, bnb (#95) #144

Conversation

aliabdelkader commented Mar 4, 2024

aliabdelkader commented Mar 4, 2024

IlyasMoutawwakil commented Mar 4, 2024

IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

IlyasMoutawwakil commented Mar 6, 2024

aliabdelkader commented Mar 6, 2024 • edited Loading

IlyasMoutawwakil commented Mar 6, 2024

IlyasMoutawwakil commented Mar 8, 2024

aliabdelkader commented Mar 8, 2024

aliabdelkader commented Mar 6, 2024 •

edited

Loading