From 3942014a10b7685a1aa7ca7cb5c3556bf8d2b2a9 Mon Sep 17 00:00:00 2001 From: rthapa84 <114017599+rthapa84@users.noreply.github.com> Date: Wed, 5 Jun 2024 18:02:07 -0700 Subject: [PATCH] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 1949b69..1d90780 100644 --- a/README.md +++ b/README.md @@ -136,7 +136,7 @@ The image captures not just a moment, but a story of skill, courage, and the joy ## 📊 Dataset -I will release it soon on HF hub. +We will release it soon on HF hub. @@ -147,14 +147,14 @@ I will release it soon on HF hub. We adopt a two-stage training process. ### Stage 1 -In this stage, we only train our projection layer, so that the model learns to map the embeddings from the vision encoder into the LLM space. The dataset mixture used in this stage is `stage1_dataset`, which contains short image and caption pairs. +In this stage, we only train our projection layer, so that the model learns to map the embeddings from the vision encoder into the LLM space. ```bash sh train_dragonfly_stage1.sh ``` ### Stage 2 -In this stage, we train our vision encoder, projection layer, and LLM jointly on image and text data. Our training dataset mixture for this stage is provided in `stage2_dataset`. This dataset contains xx.xx% of text-only dataset as well. We also include a math dataset, given in `math_instruct`. +In this stage, we train our vision encoder, projection layer, and LLM jointly on image and text data. ```bash sh train_dragonfly_stage2.sh