Reuse KV cache of prefixes #5572

tohtana · 2024-05-27T20:57:42Z

This PR implements reusing KV cache prefixes.
See deepspeedai/DeepSpeed-MII#484 for details.

umchand · 2024-06-03T22:56:12Z

deepspeed/inference/v2/ragged/prefix_block_map.py

+        for i in range(num_already_cached_blocks, n_blocks):
+            chunk = tokens[:(i + 1) * self.block_size]
+            hash = token_ids_to_hash(chunk)
+            if hash not in self.tokens_to_blocks:


Do we still need this if statement if in the for loop you are already starting from num_already_cached_blocks.

I'm not sure if it is necessary but there might be a complicated case. Assume we are running two requests.
Request 1 is using cached blocks 0, 1, 2, and just generated the last token of the current block. Then it saves the generated sequence and a hash.
Request 2 is also using the cached block 0, 1, 2, but the generation is a few steps later than Request 1. They are not sharing the last block. But the request may generate exact same tokens for the last block. So, it will try to update the cache with the same hash.
In this case, I didn't want to overwrite the cache.

sounds good, so this is to ensure we are sharing the recently generated blocks.

tohtana and others added 18 commits April 23, 2024 01:43

reuse prefix of kv cache

696e3b8

free block with no ref

2e8ac1c

reversed block id list when freeing

619a363

add option to enable prefix cache

7307507

fix use of prefix cache

a28b706

fix allocation bug

d8e9d28

use normal allocator if prefix sharing is disabled

9cffae1

match type hint

f10f6f4

refactor allocator

0bedc39

simplify loop

bc92d2b

refactor

c2028ec

remove unnecessary reverse

959204c

fix attribute name

9289ff7

Merge branch 'master' into tohtana/cache_prefix

30b9d0b

update prefix cache at every iteration

0c8e0e6

skip looking up cache

ea50fb5

fix prefix tokens to cache

1493ab7

add assertion

07a4c44

umchand reviewed Jun 3, 2024

View reviewed changes

tohtana closed this Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse KV cache of prefixes #5572

Reuse KV cache of prefixes #5572

tohtana commented May 27, 2024

umchand Jun 3, 2024

tohtana Jun 4, 2024

umchand Jun 4, 2024

Reuse KV cache of prefixes #5572

Reuse KV cache of prefixes #5572

Conversation

tohtana commented May 27, 2024

umchand Jun 3, 2024

Choose a reason for hiding this comment

tohtana Jun 4, 2024

Choose a reason for hiding this comment

umchand Jun 4, 2024

Choose a reason for hiding this comment