-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding basic kv-cache transfer to vllm v1 #1
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
…-project#9696) Signed-off-by: André Jonasson <andre.jonasson@gmail.com>
…oject#9889) Signed-off-by: youkaichao <youkaichao@gmail.com>
…9933) Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
…9897) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
…t#8346) Signed-off-by: Peter Salas <peter@fixie.ai>
Signed-off-by: youkaichao <youkaichao@gmail.com>
…ect#9930) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Michael Green <mikegre@google.com>
…roject#9938) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
Signed-off-by: daitran2k1 <tranquangdai7a@gmail.com>
…ject#9974) Signed-off-by: MengqingCao <cmq0113@163.com>
…-project#9915) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
…0356) Signed-off-by: youkaichao <youkaichao@gmail.com>
…tructured output with MistralTokenizer (vllm-project#10363) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
…project#9919) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
…ject#10385) Signed-off-by: Randall Smith <Randall.Smith@amd.com>
…ct#10287) Signed-off-by: rbbang <anjaehyun87@gmail.com>
…led (vllm-project#10388) Signed-off-by: imkero <kerorek@outlook.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
…ect#10383) Signed-off-by: youkaichao <youkaichao@gmail.com>
…odels (vllm-project#10374) Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
…ject#10394) Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
…ject#10403) Signed-off-by: imkero <kerorek@outlook.com>
vllm-project#10392) Signed-off-by: wchen61 <wchen61@foxmail.com>
if role == "prefill" and prefill_step: | ||
dist.send(hidden_states, dst=1) | ||
for i in range(len(self.kv_caches)): | ||
dist.send(self.kv_caches[i], dst=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on the rank hack!
Quick comment from me is currently you send the entire kvcache. Next step we want to do is just sending the kvcache of specific requests' block ids which you can find in scheduler_output.scheduled_new_reqs
or scheduler_output.scheduled_resumed_reqs
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
A working implementation for vllm to do kvcache transfer between prefill and decode engine.
Tested on the
examples/offline_inference.py
.To run it, using the following commands in two terminal:
Prefill Engine
Decode Engine
The first process will execute the model for one step (the prefill step), and send the hidden_state and kv_cache to the second process to complete the following decode.
For now, at least it's working.
Decode Engine Output, seems correct
Things to pay attention: