Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add test configurations for quantization with onnxruntime, awq, bnb (#95) #144

Merged

Conversation

aliabdelkader
Copy link
Contributor

  • add test configurations for onnxruntime backend
  • add no_weights sweep to quantization tests
  • add test configuration for pytorch awq quantization
  • add test configuration for bnb quantization
  • add autoawq and bitsandbytes libraries to optional install requirements
  • update makefile and github workflow so that the autoawq & bnb get installed and their related tests run successfully locally and on CI

- add test configurations for onnxruntime backend
- add no_weights sweep to quantization tests
- add test configuration for pytorch awq quantization
- add test configuration for bnb quantization
- add autoawq and bitsandbytes libraries to optional install requirements
- update makefile and github workflow so that the autoawq & bnb get installed and their related tests run successfully locally and on CI
@aliabdelkader
Copy link
Contributor Author

Hi @IlyasMoutawwakil

  1. I intend to continue working on the remaining points of More tests #95. But, I thought it would be better if make a pull request for the quantization schemes first.
  2. Would you mind approving the github workflows for this pull request ? I want to make sure that the added tests pass on CI.
  3. I added autoawq and bitsandbytes to setup.py so that they can be installed by makefile and github workflow files similar to the other packages. I am not sure if that would work in all cases as autoawq and bitsandbytes installation depends on cuda version available.
  4. I saw that in
    QUANTIZATION_CONFIGS = {"bnb": {"llm_int8_threshold": 0.0}, "gptq": {}, "awq": {}}
    , llm_int8_threshold is set to zero by default. I am not sure why. I set it to 6 in the test configurations.
  5. the pytorch_awq_exllama example gave me an error when i tried to run it.
    image
    I think that in
    quantization_config={"version": "exllama"},
    , quantization_config is set incorrectly. In the test configuration cuda_inference_pytorch_awq_exllama, I am setting it differently by setting the version in exllama_config. I am not sure if the original example should run and if there is something I missed there.

Thank you in advance.

@IlyasMoutawwakil
Copy link
Member

thanks a lot for the PR

  1. yes agree
  2. yeh ofc
  3. for autoawq and bitsandbytes, let's not put them in setup and only install them as part of the workflow
  4. that's for benchmark reproducibility, the bnb 8bits quantization scheme uses this threshold to decompose the matmul into two operations, one in int8 and one in fp16, using zero we guarantee that only the 8bit part is used which guarantees reproducibility (since we only care about latency, throughput, .. reproducibility)
  5. that's an example of running awq on amd gpus using exllama kernels, using a branch that's still unmerged Exllama kernels support for AWQ models transformers#28634, don't worry about it.

- remove autoawq bitsandbytes from setupy.py
- add autoawq bitsandbytes to github workflows and makefie
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you very much! very nice test configurations, that's exactly what I had in mind 😊
I left some comments

.github/workflows/test_cli_rocm_pytorch.yaml Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
tests/configs/cuda_inference_pytorch_awq_exllama.yaml Outdated Show resolved Hide resolved
@IlyasMoutawwakil
Copy link
Member

awesome ! all cuda tests ran successfully 🤗

@aliabdelkader
Copy link
Contributor Author

aliabdelkader commented Mar 6, 2024

Hi @IlyasMoutawwakil,

Thank you very much for your quick responses and support. I made a new commit addressing the review comments that you had.

I just want to raise again the point about installation of autoawq and bitsandBytes (point 3 from earlier). I have removed them from the setup.py as we discussed.

But, to be honest, I think the setup.py approach is more maintainable on the long run. As, it puts all the installation requirements in one place which will make it easier for users/developers to look for it in the future. Also, pinning the dependencies to a specific version would be easier if it is done in one place.

I will leave the decision up to you. if you would like me to revert back and add them again to the setup.py, please me know. Otherwise, in case you do not any further comments, I think this PR can be merged assuming it passes on the CI.

Thank you in advance.

@IlyasMoutawwakil
Copy link
Member

thank you @aliabdelkader ! Yes feel free to add them but we will have to remove them at some point because package@git+... generally are hard to package for pypi. We also don't want setup.py to include every possibly installable third party lib, as is the case in transformers, raising an error informing the user of their missing lib makes more sense.

- update github workflow files to install autoawq and bnb using setup.py
- "requests" is installed independently because "autoawq@git+https..." requires it to proceed with its installation.
@IlyasMoutawwakil
Copy link
Member

@aliabdelkader all that is left is styling

@aliabdelkader
Copy link
Contributor Author

yes, sorry I forgot to run the Linter earlier.

@IlyasMoutawwakil IlyasMoutawwakil merged commit cafe343 into huggingface:main Mar 13, 2024
33 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants