diff --git a/README.md b/README.md index e561c58..411b171 100644 --- a/README.md +++ b/README.md @@ -129,7 +129,7 @@ The image captures not just a moment, but a story of skill, courage, and the joy ## 📊 Dataset -I will release it soon on HF hub. +We will release it soon on HF hub. @@ -140,14 +140,14 @@ I will release it soon on HF hub. We adopt a two-stage training process. ### Stage 1 -In this stage, we only train our projection layer, so that the model learns to map the embeddings from the vision encoder into the LLM space. The dataset mixture used in this stage is `stage1_dataset`, which contains short image and caption pairs. +In this stage, we only train our projection layer, so that the model learns to map the embeddings from the vision encoder into the LLM space. ```bash sh train_dragonfly_stage1.sh ``` ### Stage 2 -In this stage, we train our vision encoder, projection layer, and LLM jointly on image and text data. Our training dataset mixture for this stage is provided in `stage2_dataset`. This dataset contains xx.xx% of text-only dataset as well. We also include a math dataset, given in `math_instruct`. +In this stage, we train our vision encoder, projection layer, and LLM jointly on image and text data. ```bash sh train_dragonfly_stage2.sh