Enable roberta embedding #785

yeonsily · 2025-02-05T20:20:39Z

We set position_ids and input_ids as [batch_size, bucket_size] on hpu so need to modify the current roberta embedding forward function.

e.g. position_id on gpu
[0,1,2,3,4,5,6,7]
but position_id on hpu
[0,1,2,3,4,5,6,7,0,0,0,.....0,0] which size is 128 with padding in this case.

One thing I noticed is torch.equal() on hpu is not working properly and I have to run it on cpu.
Actually that code is nothing but checking pre-condition but we will need to investigate it.

This PR has a dependency on #758

According to: https://github.com/mlc-ai/xgrammar/blob/c1b64920cad24f44f235778c1c00bb52d57da01a/python/xgrammar/kernels/apply_token_bitmask_inplace_cpu.py#L22 xgrammar only supports float32 logits on CPU.

…t#12104) Signed-off-by: Roger Wang <ywang@roblox.com>

Signed-off-by: Isotr0py <2037008807@qq.com>

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

…m-project#12121) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>

This PR cleans up LoRA flow by removing unnecessary functions and variables. Removed special handling of `max_num_batched_tokens` for HPU in `models.py` since we internally handle this in PunicaWrapperHPU [PR](https://github.com/vllm-project/vllm/blob/d51d66c3252107d5b986d2eab7af1c210dceb708/vllm/lora/punica_wrapper/punica_hpu.py#L17) Removed `convert_mapping` from `models.py` based on this [PR](https://github.com/vllm-project/vllm/pull/5036/files#:~:text=def%20convert_mapping) Co-authored-by: Vivek Goel <vgoel@habana.ai>

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

…roject#12136) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

…lm-project#12138) Signed-off-by: Chen Zhang <zhangch99@outlook.com>

Signed-off-by: Isotr0py <2037008807@qq.com>

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

… to EngineCore (vllm-project#11960) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

…ject#12102) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: jiang1.li <jiang1.li@intel.com>

Signed-off-by: youkaichao <youkaichao@gmail.com>

…ase-2025-01-17

Multimodality fix for llava after rebase Fix for: ``` ERROR 12-16 12:31:11 engine.py:136] NotImplementedError: Unknown multi-modal data type: attention_mask ```

…ject#12119) Signed-off-by: Wallas Santos <wallashss@ibm.com>

) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

Co-authored-by: Michael Goin <michael@neuralmagic.com>

The vLLM notebooks in the the Gaudi-tutorials repo were moved, which resulted in broken links in the README file. This PR fixes those links. Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Jannis Schönleber <joennlae@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Signed-off-by: rickyx <rickyx@anyscale.com> Signed-off-by: Andy Lo <andy@mistral.ai> Signed-off-by: Adrian Cole <adrian.cole@elastic.co> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Divakar Verma <divakar.verma@amd.com> Signed-off-by: maleksan85 <maleksan@amd.com> Signed-off-by: Hongxia Yang <hongxyan@amd.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: kevin <kevin@anyscale.com> Signed-off-by: Liangfu Chen <liangfc@amazon.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: xffxff <1247714429@qq.com> Signed-off-by: wangerxiao <863579016@qq.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Konrad Zawora <kzawora@habana.ai> Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: zhenwei <zhenweiliu@habana.ai> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Siyuan Liu <lsiyuan@google.com> Signed-off-by: ElizaWszola <eliza@neuralmagic.com> Signed-off-by: Junichi Sato <junichi.sato@sbintuitions.co.jp> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai> Signed-off-by: Keyun Tong <tongkeyun@gmail.com> Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com> Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Kyle Mistele <kyle@mistele.com> Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Bowen Wang <abmfy@icloud.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: wallashss <wallashss@ibm.com> Signed-off-by: Gabriel Marinho <gmarinho@ibm.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Jannis Schönleber <joennlae@gmail.com> Co-authored-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Ricky Xu <xuchen727@hotmail.com> Co-authored-by: Andy Lo <andylolu24@gmail.com> Co-authored-by: Adrian Cole <64215+codefromthecrypt@users.noreply.github.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com> Co-authored-by: maleksan85 <maleksan@amd.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: zhou fan <1247714429@qq.com> Co-authored-by: Robin <863579016@qq.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: rasmith <Randall.Smith@amd.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: liuzhenwei <zhenweiliu@habana.ai> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: imkero <kerorek@outlook.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Junichi Sato <junichi.sato@sbintuitions.co.jp> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: omer-dayan <omer@run.ai> Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Keyun Tong <tongkeyun@gmail.com> Co-authored-by: Matthew Hendrey <matthew.hendrey@gmail.com> Co-authored-by: shangmingc <caishangming@linux.alibaba.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Kyle Mistele <kyle@mistele.com> Co-authored-by: Pooya Davoodi <pooya.davoodi@parasail.io> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Bowen Wang <abmfy@icloud.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by: wallashss <wallashss@ibm.com> Co-authored-by: Jiangfei Duan <jfduan@outlook.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Gabriel Marinho <104592062+gmarinho2@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Sebastian Schoennenbeck <sebastian.schoennenbeck@comma-soft.com>

…ion_mask_to_dense (vllm-project#12347) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Wallas Santos <wallashss@ibm.com>

…LM (vllm-project#12069) Signed-off-by: hzh <hezhihui_thu@163.com> Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Signed-off-by: Chenguang Li <757486878@qq.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Shanshan Shen <467638484@qq.com> Signed-off-by: elijah <f1renze.142857@gmail.com> Signed-off-by: Yikun <yikunkero@gmail.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Konrad Zawora <kzawora@habana.ai> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com> Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: sixgod <evethwillbeok@outlook.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com> Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com> Co-authored-by: Concurrensee <yida.wu@amd.com> Co-authored-by: Chenguang Li <757486878@qq.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Alex Brooks <alex.brooks@ibm.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Steve Luo <36296769+SunflowerAries@users.noreply.github.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: TJian <tunjian1996@gmail.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: maang-h <55082429+maang-h@users.noreply.github.com> Co-authored-by: Elfie Guo <164945471+elfiegg@users.noreply.github.com> Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>

Reverts #744

This PR updates `test_lora_manager_hpu.py` based on latest rebase.

…2409) Signed-off-by: liuyanyi <wolfsonliu@163.com>

…dels. (vllm-project#11787) Signed-off-by: Pavani Majety <pmajety@nvidia.com> Co-authored-by: mgoin <michael@neuralmagic.com>

As hot fix copy not yet merged vllm-project#12536

There's no reason for current attention head sizes restrictions - we theoretically can support any size with current implementations. This patch fixes that.

This PR enables loading GPTQ quantized models and running weight-only quantized inference on HPU. For a previous discussion, see #421.

This PR enables loading AWQ quantized models and running weight-only quantized inference on HPU. Currently, it works only for BF16 inference due to kernel torch.ops.hpu.convert_from_uint4 not supporting FP16. Tested on TheBloke/Llama-2-70B-Chat-AWQ and worked. --------- Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

Dummy parameter initialization (load_format='dummy') is not working on hpu due to torch.generator not being supported. This PR fixes the issue by bypassing the generator. --------- Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>

#767) Decorator @cache and "and" operator used by enabled_flags causes recompilations in torch compile.

Add pip upgrade to installation steps for proper upgrade of vllm-hpu-extension package

Fix for selecting correct backend for MultiHeadAttention - previous code always defaulted to _Backend.TORCH_SDPA --------- Co-authored-by: root <root@adobrzyniewicz-mcz3-g2-mpijob-worker-0.adobrzyniewicz-mcz3-g2-mpijob-worker.framework.svc.cluster.local>

madamczykhabana and others added 30 commits January 16, 2025 15:06

Move scores to float32 in case of running xgrammar on cpu (#695)

b3a0db2

According to: https://github.com/mlc-ai/xgrammar/blob/c1b64920cad24f44f235778c1c00bb52d57da01a/python/xgrammar/kernels/apply_token_bitmask_inplace_cpu.py#L22 xgrammar only supports float32 logits on CPU.

[Bugfix] Fix max image feature size for Llava-one-vision (vllm-projec…

874f7c2

…t#12104) Signed-off-by: Roger Wang <ywang@roblox.com>

[misc] Add LoRA kernel micro benchmarks (vllm-project#11579)

5fd24ec

[Model] Add support for deepseek-vl2-tiny model (vllm-project#12068)

62b06ba

Signed-off-by: Isotr0py <2037008807@qq.com>

[Bugfix] Set enforce_eager automatically for mllama (vllm-project#12127)

d06e824

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

[Bugfix] Fix a path bug in disaggregated prefill example script. (vll…

ebc73f2

…m-project#12121) Signed-off-by: Kuntai Du <kuntai@uchicago.edu>

[CI]add genai-perf benchmark in nightly benchmark (vllm-project#10704)

fead53b

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

[Doc] Add instructions on using Podman when SELinux is active (vllm-p…

1475847

…roject#12136) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

[Bugfix] Fix issues in CPU build Dockerfile (vllm-project#12135)

b8bfa46

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

[BugFix] add more is not None check in VllmConfig.__post_init__ (vl…

d1adb9b

…lm-project#12138) Signed-off-by: Chen Zhang <zhangch99@outlook.com>

[Misc] Add deepseek_vl2 chat template (vllm-project#12143)

d75ab55

Signed-off-by: Isotr0py <2037008807@qq.com>

[ROCm][MoE] moe tuning support for rocm (vllm-project#12049)

8027a72

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

[V1] Move more control of kv cache initialization from model_executor…

69d765f

… to EngineCore (vllm-project#11960) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

Merge branch 'habana_main' into adobrzyniewicz/multimodality_for_llava

2d85682

Check if kv_cache is tuple before calling split_kv_cache (#697)

a685225

Merge branch 'habana_main' into adobrzyniewicz/multimodality_for_llava

a293e2e

[Misc][LoRA] Improve the readability of LoRA error messages (vllm-pro…

07934cc

…ject#12102) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

[CI/Build][CPU][Bugfix] Fix CPU CI (vllm-project#12150)

d4e6194

Signed-off-by: jiang1.li <jiang1.li@intel.com>

[core] allow callable in collective_rpc (vllm-project#12151)

87a0c07

Signed-off-by: youkaichao <youkaichao@gmail.com>

[CI] Cleanup run_tests.sh logs (#700)

7eea2df

Merge remote-tracking branch 'upstream/main' into private/kzawora/reb…

ce50b1a

…ase-2025-01-17

fix TP crashes

a128878

make mypy happy

2e53e75

¿what the heck is incquark?

21f5fb2

i forgot brackets again

f1e911d

Multimodality fix for llava (#641)

ae67e4d

Multimodality fix for llava after rebase Fix for: ``` ERROR 12-16 12:31:11 engine.py:136] NotImplementedError: Unknown multi-modal data type: attention_mask ```

Rebase 2025-01-17 (#701)

018ce62

[Bugfix] Fix score api for missing max_model_len validation (vllm-pro…

58fd57f

…ject#12119) Signed-off-by: Wallas Santos <wallashss@ibm.com>

[Bugfix] Mistral tokenizer encode accept list of str (vllm-project#12149

54cacf0

) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

robertgshaw2-redhat and others added 26 commits January 29, 2025 04:56

[V1] Improve Error Message for Unsupported Config (vllm-project#12535)

5f671cb

Co-authored-by: Michael Goin <michael@neuralmagic.com>

Fix Gaudi tutorial links in the main README (#746)

8642d89

The vLLM notebooks in the the Gaudi-tutorials repo were moved, which resulted in broken links in the README file. This PR fixes those links. Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com>

Fix the pydantic logging validator (vllm-project#12420)

ef001d9

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

[Bugfix] handle alignment of arguments in convert_sparse_cross_attent…

036ca94

…ion_mask_to_dense (vllm-project#12347) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Wallas Santos <wallashss@ibm.com>

Revert "Rebase 2025.01.28" (#749)

75660ff

Reverts #744

Rebase 2025.01.28 - attempt 2 (#750)

73afddb

Fix LoRA test (#711)

cf9af94

This PR updates `test_lora_manager_hpu.py` based on latest rebase.

[Frontend] Support override generation config in args (vllm-project#1…

ff7424f

…2409) Signed-off-by: liuyanyi <wolfsonliu@163.com>

[Hardware][NV] Fix Modelopt model loading for k-v-scales for Llama mo…

b02fd28

…dels. (vllm-project#11787) Signed-off-by: Pavani Majety <pmajety@nvidia.com> Co-authored-by: mgoin <michael@neuralmagic.com>

Merge remote-tracking branch 'upstream/main' into HEAD

69cb139

Copy changes from vllm PR 12536 (#753)

446eab2

As hot fix copy not yet merged vllm-project#12536

Expand supported attention head sizes (#752)

2d152ed

There's no reason for current attention head sizes restrictions - we theoretically can support any size with current implementations. This patch fixes that.

Rebase 2025.01.29 (#751)

1710059

GPTQ Support [Cont.] (#481)

ce21aad

This PR enables loading GPTQ quantized models and running weight-only quantized inference on HPU. For a previous discussion, see #421.

Update requirements-hpu.txt (#756)

55ef71b

Generator bypass for dummy init (#747)

3f0c5f6

Dummy parameter initialization (load_format='dummy') is not working on hpu due to torch.generator not being supported. This PR fixes the issue by bypassing the generator. --------- Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>

Update vllm-hpu-extension; fix for compile recompilations due to @CAC… (

2607428

#767) Decorator @cache and "and" operator used by enabled_flags causes recompilations in torch compile.

Add pip upgrade to installation steps (#699)

370953d

Add pip upgrade to installation steps for proper upgrade of vllm-hpu-extension package

Make padding-aware scheduling disableable

4deb8cf

Make padding-aware scheduling disableable (#771)

85eb147

Inital draft of enabling roberta embedding on hpu

50003f1

Remove dynamic shape and run torch.equal() on cpu

157f388

yeonsily requested review from libinta and kzawora-intel February 5, 2025 20:20

yeonsily closed this Feb 5, 2025

yeonsily deleted the dev/enable_roberta_embedding branch February 5, 2025 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable roberta embedding #785

Enable roberta embedding #785

yeonsily commented Feb 5, 2025

Enable roberta embedding #785

Enable roberta embedding #785

Conversation

yeonsily commented Feb 5, 2025