forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable roberta embedding #785
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
According to: https://github.com/mlc-ai/xgrammar/blob/c1b64920cad24f44f235778c1c00bb52d57da01a/python/xgrammar/kernels/apply_token_bitmask_inplace_cpu.py#L22 xgrammar only supports float32 logits on CPU.
…t#12104) Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
…m-project#12121) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
This PR cleans up LoRA flow by removing unnecessary functions and variables. Removed special handling of `max_num_batched_tokens` for HPU in `models.py` since we internally handle this in PunicaWrapperHPU [PR](https://github.com/vllm-project/vllm/blob/d51d66c3252107d5b986d2eab7af1c210dceb708/vllm/lora/punica_wrapper/punica_hpu.py#L17) Removed `convert_mapping` from `models.py` based on this [PR](https://github.com/vllm-project/vllm/pull/5036/files#:~:text=def%20convert_mapping) Co-authored-by: Vivek Goel <vgoel@habana.ai>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
…roject#12136) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
…lm-project#12138) Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
… to EngineCore (vllm-project#11960) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
…ject#12102) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Multimodality fix for llava after rebase Fix for: ``` ERROR 12-16 12:31:11 engine.py:136] NotImplementedError: Unknown multi-modal data type: attention_mask ```
…ject#12119) Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
The vLLM notebooks in the the Gaudi-tutorials repo were moved, which resulted in broken links in the README file. This PR fixes those links. Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Jannis Schönleber <joennlae@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Signed-off-by: rickyx <rickyx@anyscale.com> Signed-off-by: Andy Lo <andy@mistral.ai> Signed-off-by: Adrian Cole <adrian.cole@elastic.co> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Divakar Verma <divakar.verma@amd.com> Signed-off-by: maleksan85 <maleksan@amd.com> Signed-off-by: Hongxia Yang <hongxyan@amd.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: kevin <kevin@anyscale.com> Signed-off-by: Liangfu Chen <liangfc@amazon.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: xffxff <1247714429@qq.com> Signed-off-by: wangerxiao <863579016@qq.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Konrad Zawora <kzawora@habana.ai> Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: zhenwei <zhenweiliu@habana.ai> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Siyuan Liu <lsiyuan@google.com> Signed-off-by: ElizaWszola <eliza@neuralmagic.com> Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai> Signed-off-by: Keyun Tong <tongkeyun@gmail.com> Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com> Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Kyle Mistele <kyle@mistele.com> Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Bowen Wang <abmfy@icloud.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: wallashss <wallashss@ibm.com> Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Jannis Schönleber <joennlae@gmail.com> Co-authored-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Ricky Xu <xuchen727@hotmail.com> Co-authored-by: Andy Lo <andylolu24@gmail.com> Co-authored-by: Adrian Cole <64215+codefromthecrypt@users.noreply.github.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com> Co-authored-by: maleksan85 <maleksan@amd.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: zhou fan <1247714429@qq.com> Co-authored-by: Robin <863579016@qq.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: rasmith <Randall.Smith@amd.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: liuzhenwei <zhenweiliu@habana.ai> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: imkero <kerorek@outlook.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Junichi Sato <junichi.sato@sbintuitions.co.jp> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: omer-dayan <omer@run.ai> Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Keyun Tong <tongkeyun@gmail.com> Co-authored-by: Matthew Hendrey <matthew.hendrey@gmail.com> Co-authored-by: shangmingc <caishangming@linux.alibaba.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Kyle Mistele <kyle@mistele.com> Co-authored-by: Pooya Davoodi <pooya.davoodi@parasail.io> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Bowen Wang <abmfy@icloud.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by: wallashss <wallashss@ibm.com> Co-authored-by: Jiangfei Duan <jfduan@outlook.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Gabriel Marinho <104592062+gmarinho2@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Sebastian Schoennenbeck <sebastian.schoennenbeck@comma-soft.com>
…ion_mask_to_dense (vllm-project#12347) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Wallas Santos <wallashss@ibm.com>
…LM (vllm-project#12069) Signed-off-by: hzh <hezhihui_thu@163.com> Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Signed-off-by: Chenguang Li <757486878@qq.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Shanshan Shen <467638484@qq.com> Signed-off-by: elijah <f1renze.142857@gmail.com> Signed-off-by: Yikun <yikunkero@gmail.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Konrad Zawora <kzawora@habana.ai> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com> Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: sixgod <evethwillbeok@outlook.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com> Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com> Co-authored-by: Concurrensee <yida.wu@amd.com> Co-authored-by: Chenguang Li <757486878@qq.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Alex Brooks <alex.brooks@ibm.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Steve Luo <36296769+SunflowerAries@users.noreply.github.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: TJian <tunjian1996@gmail.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: maang-h <55082429+maang-h@users.noreply.github.com> Co-authored-by: Elfie Guo <164945471+elfiegg@users.noreply.github.com> Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>
This PR updates `test_lora_manager_hpu.py` based on latest rebase.
…2409) Signed-off-by: liuyanyi <wolfsonliu@163.com>
…dels. (vllm-project#11787) Signed-off-by: Pavani Majety <pmajety@nvidia.com> Co-authored-by: mgoin <michael@neuralmagic.com>
As hot fix copy not yet merged vllm-project#12536
There's no reason for current attention head sizes restrictions - we theoretically can support any size with current implementations. This patch fixes that.
This PR enables loading GPTQ quantized models and running weight-only quantized inference on HPU. For a previous discussion, see #421.
This PR enables loading AWQ quantized models and running weight-only quantized inference on HPU. Currently, it works only for BF16 inference due to kernel torch.ops.hpu.convert_from_uint4 not supporting FP16. Tested on TheBloke/Llama-2-70B-Chat-AWQ and worked. --------- Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Dummy parameter initialization (load_format='dummy') is not working on hpu due to torch.generator not being supported. This PR fixes the issue by bypassing the generator. --------- Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
#767) Decorator @cache and "and" operator used by enabled_flags causes recompilations in torch compile.
Add pip upgrade to installation steps for proper upgrade of vllm-hpu-extension package
Fix for selecting correct backend for MultiHeadAttention - previous code always defaulted to _Backend.TORCH_SDPA --------- Co-authored-by: root <root@adobrzyniewicz-mcz3-g2-mpijob-worker-0.adobrzyniewicz-mcz3-g2-mpijob-worker.framework.svc.cluster.local>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We set position_ids and input_ids as [batch_size, bucket_size] on hpu so need to modify the current roberta embedding forward function.
e.g. position_id on gpu
[0,1,2,3,4,5,6,7]
but position_id on hpu
[0,1,2,3,4,5,6,7,0,0,0,.....0,0] which size is 128 with padding in this case.
One thing I noticed is torch.equal() on hpu is not working properly and I have to run it on cpu.
Actually that code is nothing but checking pre-condition but we will need to investigate it.
This PR has a dependency on #758