Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable roberta embedding #785

Closed
wants to merge 1,717 commits into from
Closed

Enable roberta embedding #785

wants to merge 1,717 commits into from

Conversation

yeonsily
Copy link

@yeonsily yeonsily commented Feb 5, 2025

We set position_ids and input_ids as [batch_size, bucket_size] on hpu so need to modify the current roberta embedding forward function.

e.g. position_id on gpu
[0,1,2,3,4,5,6,7]
but position_id on hpu
[0,1,2,3,4,5,6,7,0,0,0,.....0,0] which size is 128 with padding in this case.

One thing I noticed is torch.equal() on hpu is not working properly and I have to run it on cpu.
Actually that code is nothing but checking pre-condition but we will need to investigate it.

This PR has a dependency on #758

madamczykhabana and others added 30 commits January 16, 2025 15:06
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
This PR cleans up LoRA flow by removing unnecessary functions and
variables.

Removed special handling of `max_num_batched_tokens` for HPU in
`models.py` since we internally handle this in PunicaWrapperHPU
[PR](https://github.com/vllm-project/vllm/blob/d51d66c3252107d5b986d2eab7af1c210dceb708/vllm/lora/punica_wrapper/punica_hpu.py#L17)

Removed `convert_mapping` from `models.py` based on this
[PR](https://github.com/vllm-project/vllm/pull/5036/files#:~:text=def%20convert_mapping)

Co-authored-by: Vivek Goel <vgoel@habana.ai>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
… to EngineCore (vllm-project#11960)

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Multimodality fix for llava after rebase

Fix for:
```
ERROR 12-16 12:31:11 engine.py:136] NotImplementedError: Unknown multi-modal data type: attention_mask
```
)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
robertgshaw2-redhat and others added 26 commits January 29, 2025 04:56
Co-authored-by: Michael Goin <michael@neuralmagic.com>
The vLLM notebooks in the the Gaudi-tutorials repo were moved, which
resulted in broken links in the README file. This PR fixes those links.

Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Jannis Schönleber <joennlae@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: rickyx <rickyx@anyscale.com>
Signed-off-by: Andy Lo <andy@mistral.ai>
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Signed-off-by: maleksan85 <maleksan@amd.com>
Signed-off-by: Hongxia Yang <hongxyan@amd.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: kevin <kevin@anyscale.com>
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: xffxff <1247714429@qq.com>
Signed-off-by: wangerxiao <863579016@qq.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com>
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Kyle Mistele <kyle@mistele.com>
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Bowen Wang <abmfy@icloud.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: wallashss <wallashss@ibm.com>
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Jannis Schönleber <joennlae@gmail.com>
Co-authored-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Ricky Xu <xuchen727@hotmail.com>
Co-authored-by: Andy Lo <andylolu24@gmail.com>
Co-authored-by: Adrian Cole <64215+codefromthecrypt@users.noreply.github.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Jani Monoses <jani.monoses@gmail.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com>
Co-authored-by: maleksan85 <maleksan@amd.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: zhou fan <1247714429@qq.com>
Co-authored-by: Robin <863579016@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: liuzhenwei <zhenweiliu@habana.ai>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: imkero <kerorek@outlook.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: omer-dayan <omer@run.ai>
Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Keyun Tong <tongkeyun@gmail.com>
Co-authored-by: Matthew Hendrey <matthew.hendrey@gmail.com>
Co-authored-by: shangmingc <caishangming@linux.alibaba.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Kyle Mistele <kyle@mistele.com>
Co-authored-by: Pooya Davoodi <pooya.davoodi@parasail.io>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Bowen Wang <abmfy@icloud.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: wallashss <wallashss@ibm.com>
Co-authored-by: Jiangfei Duan <jfduan@outlook.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Gabriel Marinho <104592062+gmarinho2@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Sebastian Schoennenbeck <sebastian.schoennenbeck@comma-soft.com>
…ion_mask_to_dense (vllm-project#12347)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Wallas Santos <wallashss@ibm.com>
…LM (vllm-project#12069)

Signed-off-by: hzh <hezhihui_thu@163.com>
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Chenguang Li <757486878@qq.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Shanshan Shen <467638484@qq.com>
Signed-off-by: elijah <f1renze.142857@gmail.com>
Signed-off-by: Yikun <yikunkero@gmail.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com>
Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: sixgod <evethwillbeok@outlook.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com>
Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: Chenguang Li <757486878@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Steve Luo <36296769+SunflowerAries@users.noreply.github.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: TJian <tunjian1996@gmail.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: maang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: Elfie Guo <164945471+elfiegg@users.noreply.github.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
This PR updates `test_lora_manager_hpu.py` based on latest rebase.
…dels. (vllm-project#11787)

Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
There's no reason for current attention head sizes restrictions - we
theoretically can support any size with current implementations. This
patch fixes that.
This PR enables loading GPTQ quantized models and running weight-only
quantized inference on HPU. For a previous discussion, see #421.
This PR enables loading AWQ quantized models and running weight-only
quantized inference on HPU.

Currently, it works only for BF16 inference due to kernel
torch.ops.hpu.convert_from_uint4 not supporting FP16.

Tested on TheBloke/Llama-2-70B-Chat-AWQ and worked.

---------

Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Dummy parameter initialization (load_format='dummy') is not working on
hpu due to torch.generator not being supported. This PR fixes the issue
by bypassing the generator.

---------

Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
#767)

Decorator @cache and "and" operator used by enabled_flags causes
recompilations in torch compile.
Add pip upgrade to installation steps for proper upgrade of
vllm-hpu-extension package
Fix for selecting correct backend for MultiHeadAttention - previous code
always defaulted to _Backend.TORCH_SDPA

---------

Co-authored-by: root <root@adobrzyniewicz-mcz3-g2-mpijob-worker-0.adobrzyniewicz-mcz3-g2-mpijob-worker.framework.svc.cluster.local>
@yeonsily yeonsily closed this Feb 5, 2025
@yeonsily yeonsily deleted the dev/enable_roberta_embedding branch February 5, 2025 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.