diff --git a/content/blogs/cllm/index.md b/content/blogs/cllm/index.md index df34d05..127c713 100644 --- a/content/blogs/cllm/index.md +++ b/content/blogs/cllm/index.md @@ -160,6 +160,23 @@ Our experiments contain three domain-specific tasks, including Spider (text-to-S **Open-domain conversational Challenge (MT-bench):** CLLM trained from LLaMA2-7B using ShareGPT dataset can achieve roughly the same speedup as Medusa2 when combined with lookahead decoding, with comparable scores on MT-bench. However, CLLM offers higher adaptability and memory efficiency as it requires no modifications to the target model's original architecture and no auxiliary components. {{< /justify >}} +**Training Cost:** +{{< justify >}} +The fine-tuning cost of CLLMs is moderate, e.g., passing only around 1M tokens for LLaMA-7B to achieve a $3.4\times$ speedup on the Spider dataset. In the cases where the dataset size is large, for example, for CodeSearchNet-Python, only 10% of the dataset is required to generate Jacobi trajectories in training CLLMs to obtain around $2.5\times$ speedup. The total number of tokens can be estimated by taking: + +$N = \text{avg # of trajectories per prompt} \times \text{avg seq length} \times \text{# of prompts}$. +{{< /justify >}} + +{{< center >}} +| dataset | estimated training cost (tokens) | +|:---:|:---:| +| Spider | $2\times 10^6$ | +| CodeSearchNet-Python | $1 \times 10^8$ | +| GSM8K | $1 \times 10^7$ | +| ShareGPT | $2 \times 10^8$ | + +{{< /center >}} + ### Fast Forwarding and Stationary Tokens {{< image src="img/trajectory_compare_aligned.png" alt="trajectory_compare" width="120%" title="Figure 7: Comparison of Jacobi trajectory between a target LLM and CLLMs on Spider. Each point along the Jacobi trajectory is a color-coded sequence: blue for correct tokens matching with AR results, and red for inaccurate ones. CLLM demonstrates enhanced efficiency, converging to the fixed point $2\times$ faster the Target LLM. This increased efficiency in the CLLM can be attributed to the consistency loss which facilitates the learning of the structure of each $n$-token sequence given a prefix.">}} diff --git a/layouts/shortcodes/center.html b/layouts/shortcodes/center.html new file mode 100644 index 0000000..ec9efdb --- /dev/null +++ b/layouts/shortcodes/center.html @@ -0,0 +1,3 @@ +
+ {{ .Inner | markdownify }} +
diff --git a/public/.DS_Store b/public/.DS_Store index 01d6442..a370d29 100644 Binary files a/public/.DS_Store and b/public/.DS_Store differ diff --git a/public/blogs/cllm/img/clm_objective_legacy.png b/public/blogs/cllm/img/clm_objective_legacy.png new file mode 100644 index 0000000..8686768 Binary files /dev/null and b/public/blogs/cllm/img/clm_objective_legacy.png differ diff --git a/public/blogs/cllm/index.html b/public/blogs/cllm/index.html index 1899db4..3098c37 100644 --- a/public/blogs/cllm/index.html +++ b/public/blogs/cllm/index.html @@ -386,6 +386,45 @@

Results