polish cllm blog

hao-ai-lab · Mar 4, 2024 · bb0ea39 · bb0ea39
1 parent ebbf74a
commit bb0ea39
Show file tree

Hide file tree

Showing 5 changed files with 8 additions and 8 deletions.
diff --git a/content/blogs/cllm/index.md b/content/blogs/cllm/index.md
@@ -16,10 +16,10 @@ draft = false
       url = "https://github.com"
 +++
 {{< justify >}}
-**TL;DR:** In this blog, we introduce consistency large language models (CLLMs), a new family of models developed with our proposed techniques to reduce inference latency by efficiently decoding $n$ tokens in parallel. This decoding method is called [Jacobi decoding](https://arxiv.org/abs/2305.10427), which improves inference efficiency by breaking the sequential nature of conventional auto-regressive (AR) decoding. CLLMs are trained with the objective of performing efficient Jacobi decoding by mapping any randomly initialized $n$-token sequence to a correctly predicted sequence in as few steps as possible. Experiment results show CLLMs obtained using our proposed method are highly effective, showing $2.4\times$ to $3.4\times$ improvements in generation speed while preserving generation quality in comparison with the baselines and other SOTA techniques. CLLMs also show high adaptability and memory efficiency as they require no modifications to the existing model architecture and auxiliary model components.
+**TL;DR:** In this blog, we introduce consistency large language models (CLLMs), a new family of models capable of reducing inference latency by efficiently decoding $n$ tokens in parallel. This decoding method is called [Jacobi decoding](https://arxiv.org/abs/2305.10427), which improves inference efficiency by breaking the sequential nature of conventional auto-regressive (AR) decoding. CLLMs are trained with the objective of performing efficient Jacobi decoding by mapping any randomly initialized $n$-token sequence to the same result as AR decoding in as few steps as possible. Experiment results show CLLMs obtained using our proposed method are highly effective, showing $2.4\times$ to $3.4\times$ improvements in generation speed while preserving generation quality in comparison with the baseline pre-trained models and other SOTA techniques like Medusa and speculative decoding. CLLMs also show high adaptability and memory efficiency as they require no modifications to the existing model architecture and auxiliary model components.
 {{< /justify >}}
 
-{{< image src="img/baseline_vs_cllm_gsm8k_acc_demo.gif" alt="cllm-gsm8k-acc-demo" width="120%" title="Figure 1: Demo of speedup by CLLM-ABEL-7B-002 in comparison with baseline ABEL-7B-002 using Jacobi decoding on GSM8K.">}}
+{{< image src="img/baseline_vs_cllm_gsm8k_acc_demo.gif" alt="cllm-gsm8k-acc-demo" width="120%" title="Figure 1: Demo of speedup by CLLM-ABEL-7B-001 in comparison with baseline [ABEL-7B-001](https://huggingface.co/GAIR/Abel-7B-001) using Jacobi decoding on GSM8K.">}}
 
 ## Background: Jacobi Decoding
 

diff --git a/layouts/shortcodes/image.html b/layouts/shortcodes/image.html
@@ -2,7 +2,7 @@
     <figure>
 	<div style="display: grid; place-items: center;">
         	<img src="{{ .Get "src" }}" alt="{{ .Get "alt" }}" style="width: {{ .Get "width" }}; height: auto;">
-		<figcaption style="font-size: 16px;"><strong>{{ .Get "title" }}</strong></figcaption>
+		<figcaption style="font-size: 16px;"><strong>{{ .Get "title" | markdownify }}</strong></figcaption>
     	</div>
     </figure>
 {{ else }}

diff --git a/public/blogs/cllm/index.html b/public/blogs/cllm/index.html
@@ -9,7 +9,7 @@
 <meta name="robots" content="noindex, nofollow">
 <title>Consistency Large Language Models: A Family of Efficient Parallel Decoders | Hao Lab @ UCSD</title>
 <meta name="keywords" content="">
-<meta name="description" content="TL;DR: In this blog, we introduce consistency large language models (CLLMs), a new family of models developed with our proposed techniques to reduce inference latency by efficiently decoding $n$ tokens in parallel. This decoding method is called Jacobi decoding, which improves inference efficiency by breaking the sequential nature of conventional auto-regressive (AR) decoding. CLLMs are trained with the objective of performing efficient Jacobi decoding by mapping any randomly initialized $n$-token sequence to a correctly predicted sequence in as few steps as possible.">
+<meta name="description" content="TL;DR: In this blog, we introduce consistency large language models (CLLMs), a new family of models capable of reducing inference latency by efficiently decoding $n$ tokens in parallel. This decoding method is called Jacobi decoding, which improves inference efficiency by breaking the sequential nature of conventional auto-regressive (AR) decoding. CLLMs are trained with the objective of performing efficient Jacobi decoding by mapping any randomly initialized $n$-token sequence to the same result as AR decoding in as few steps as possible.">
 <meta name="author" content="">
 <link rel="canonical" href="//localhost:1313/blogs/cllm/">
 <link crossorigin="anonymous" href="/assets/css/stylesheet.b609c58d5c11bb90b1a54e04005d74ad1ddf22165eb79f5533967e57df9c3b50.css" integrity="sha256-tgnFjVwRu5CxpU4EAF10rR3fIhZet59VM5Z&#43;V9&#43;cO1A=" rel="preload stylesheet" as="style">
@@ -169,7 +169,7 @@ <h1 class="post-title entry-hint-parent">
         <p>An instance of Jacobi trajectory and an illustration of the global consistency loss learning objective.</p>
 </figure>
   <div class="post-content"><div style="text-align: justify;">
-    <strong>TL;DR:</strong> In this blog, we introduce consistency large language models (CLLMs), a new family of models developed with our proposed techniques to reduce inference latency by efficiently decoding $n$ tokens in parallel. This decoding method is called <a href="https://arxiv.org/abs/2305.10427">Jacobi decoding</a>, which improves inference efficiency by breaking the sequential nature of conventional auto-regressive (AR) decoding. CLLMs are trained with the objective of performing efficient Jacobi decoding by mapping any randomly initialized $n$-token sequence to a correctly predicted sequence in as few steps as possible. Experiment results show CLLMs obtained using our proposed method are highly effective, showing $2.4\times$ to $3.4\times$ improvements in generation speed while preserving generation quality in comparison with the baselines and other SOTA techniques. CLLMs also show high adaptability and memory efficiency as they require no modifications to the existing model architecture and auxiliary model components.
+    <strong>TL;DR:</strong> In this blog, we introduce consistency large language models (CLLMs), a new family of models capable of reducing inference latency by efficiently decoding $n$ tokens in parallel. This decoding method is called <a href="https://arxiv.org/abs/2305.10427">Jacobi decoding</a>, which improves inference efficiency by breaking the sequential nature of conventional auto-regressive (AR) decoding. CLLMs are trained with the objective of performing efficient Jacobi decoding by mapping any randomly initialized $n$-token sequence to the same result as AR decoding in as few steps as possible. Experiment results show CLLMs obtained using our proposed method are highly effective, showing $2.4\times$ to $3.4\times$ improvements in generation speed while preserving generation quality in comparison with the baseline pre-trained models and other SOTA techniques like Medusa and speculative decoding. CLLMs also show high adaptability and memory efficiency as they require no modifications to the existing model architecture and auxiliary model components.
 </div>
 
 
@@ -178,7 +178,7 @@ <h1 class="post-title entry-hint-parent">
     <figure>
 	<div style="display: grid; place-items: center;">
         	<img src="img/baseline_vs_cllm_gsm8k_acc_demo.gif" alt="cllm-gsm8k-acc-demo" style="width: 120%; height: auto;">
-		<figcaption style="font-size: 16px;"><strong>Figure 1: Demo of speedup by CLLM-ABEL-7B-002 in comparison with baseline ABEL-7B-002 using Jacobi decoding on GSM8K.</strong></figcaption>
+		<figcaption style="font-size: 16px;"><strong>Figure 1: Demo of speedup by CLLM-ABEL-7B-001 in comparison with baseline <a href="https://huggingface.co/GAIR/Abel-7B-001">ABEL-7B-001</a> using Jacobi decoding on GSM8K.</strong></figcaption>
     	</div>
     </figure>
 

diff --git a/public/blogs/index.html b/public/blogs/index.html
@@ -161,7 +161,7 @@
       <h2><a href="//localhost:1313/blogs/cllm/">Consistency Large Language Models: A Family of Efficient Parallel Decoders</a></h2>
       <time datetime="2024-02-21 12:00:00 -0800 PST">February 21, 2024</time>
       <p class="post-author"> Siqi Kou*, Lanxiang Hu*, Zhezhi He, Zhijie Deng, Hao Zhang</p>
-      <p style="text-align: justify;">TL;DR: In this blog, we introduce consistency large language models (CLLMs), a new family of models developed with our proposed techniques to reduce inference latency by efficiently decoding $n$ tokens in parallel. This decoding method is called Jacobi decoding, which improves inference efficiency by breaking the sequential nature of conventional auto-regressive (AR) decoding. CLLMs are trained with the objective of performing efficient Jacobi decoding by mapping any randomly initialized $n$-token sequence to a correctly predicted sequence in as few steps as possible.</p></div>
+      <p style="text-align: justify;">TL;DR: In this blog, we introduce consistency large language models (CLLMs), a new family of models capable of reducing inference latency by efficiently decoding $n$ tokens in parallel. This decoding method is called Jacobi decoding, which improves inference efficiency by breaking the sequential nature of conventional auto-regressive (AR) decoding. CLLMs are trained with the objective of performing efficient Jacobi decoding by mapping any randomly initialized $n$-token sequence to the same result as AR decoding in as few steps as possible.</p></div>
   </article><article class="post-preview">
     <div class="post-image">
       <img src="//localhost:1313/img/slider/lookahead_decoding.jpg" alt=""> 

diff --git a/public/blogs/index.xml b/public/blogs/index.xml
@@ -13,7 +13,7 @@
       <link>//localhost:1313/blogs/cllm/</link>
       <pubDate>Wed, 21 Feb 2024 12:00:00 -0800</pubDate>
       <guid>//localhost:1313/blogs/cllm/</guid>
-      <description>TL;DR: In this blog, we introduce consistency large language models (CLLMs), a new family of models developed with our proposed techniques to reduce inference latency by efficiently decoding $n$ tokens in parallel. This decoding method is called Jacobi decoding, which improves inference efficiency by breaking the sequential nature of conventional auto-regressive (AR) decoding. CLLMs are trained with the objective of performing efficient Jacobi decoding by mapping any randomly initialized $n$-token sequence to a correctly predicted sequence in as few steps as possible.</description>
+      <description>TL;DR: In this blog, we introduce consistency large language models (CLLMs), a new family of models capable of reducing inference latency by efficiently decoding $n$ tokens in parallel. This decoding method is called Jacobi decoding, which improves inference efficiency by breaking the sequential nature of conventional auto-regressive (AR) decoding. CLLMs are trained with the objective of performing efficient Jacobi decoding by mapping any randomly initialized $n$-token sequence to the same result as AR decoding in as few steps as possible.</description>
     </item>
     <item>
       <title>Break the Sequential Dependency of LLM Inference Using Lookahead Decoding</title>