Merge pull request #19 from songtianhui/main

Support Remove Stage
MCG-NJU · Jan 26, 2024 · 94ca2e6 · 94ca2e6
2 parents a45cc9e + 91a0916
commit 94ca2e6
Show file tree

Hide file tree

Showing 30 changed files with 478 additions and 407 deletions.
diff --git a/README.md b/README.md
@@ -1,115 +1,117 @@
-# MixFormerV2
-The official implementation of the NeurIPS 2023 paper: [**MixFormerV2: Efficient Fully Transformer Tracking**](https://arxiv.org/abs/2305.15896).
-
-## Model Framework
-![model](tracking/model.png)
-
-## Distillation Training Pipeline
-![training](tracking/training.png)
-
-
-## News
-
-- **[Sep 22, 2023]** MixFormerV2 is accpeted by **NeurIPS 2023**! :tada:
-
-- **[May 31, 2023]** We released two versions of the pretrained model, which can be accessed on [Google Driver](https://drive.google.com/drive/folders/1soQMZyvIcY7YrYrGdk6MCstTPlMXNd30?usp=sharing).
-
-- **[May 26, 2023]** Code is available now!
-
-
-## Highlights
-
-### :sparkles: Efficient Fully Transformer Tracking Framework
-
-MixFormerV2 is a well unified fully transformer tracking model, without any dense convolutional operation and complex score prediction module. We propose four key prediction tokens to capture the correlation between target template and search area.
-
-### :sparkles: A New Distillation-based Model Reduction Paradigm
-
-To further improve efficiency, we present a new distillation paradigm for tracking model, including dense-to-sparse stage and deep-to-shallow stage.
-
-### :sparkles: Strong Performance and Fast Inference Speed
-
-MixFormerV2 works well for different benchmarks and can achieve **70.6%** AUC on LaSOT and **57.4%** AUC on TNL2k, while keeping 165fps on GPU. To our best knowledge, MixFormerV2-S is the **first** transformer-based one-stream tracker which achieves real-time running on CPU.
-
-
-## Install the environment
-Use the Anaconda
-``` bash
-conda create -n mixformer2 python=3.6
-conda activate mixformer2
-bash install_requirements.sh
-```
-
-## Data Preparation
-Put the tracking datasets in ./data. It should look like:
-```
-   ${MixFormerV2_ROOT}
-    -- data
-        -- lasot
-            |-- airplane
-            |-- basketball
-            |-- bear
-            ...
-        -- got10k
-            |-- test
-            |-- train
-            |-- val
-        -- coco
-            |-- annotations
-            |-- train2017
-        -- trackingnet
-            |-- TRAIN_0
-            |-- TRAIN_1
-            ...
-            |-- TRAIN_11
-            |-- TEST
-```
-
-## Set project paths
-Run the following command to set paths for this project
-```
-python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .
-```
-After running this command, you can also modify paths by editing these two files
-```
-lib/train/admin/local.py  # paths about training
-lib/test/evaluation/local.py  # paths about testing
-```
-
-## Train MixFormerV2
-
-Training with multiple GPUs using DDP. More details of other training settings can be found at `tracking/train_mixformer.sh`.
-
-``` bash
-bash tracking/train_mixformer.sh
-```
-
-## Test and evaluate MixFormerV2 on benchmarks
-- LaSOT/GOT10k-test/TrackingNet/OTB100/UAV123/TNL2k. More details of test settings can be found at `tracking/test_mixformer.sh`.
-
-``` bash
-bash tracking/test_mixformer.sh
-
-```
-
-
-## TODO
-- [ ] Progressive eliminating version of training.
-- [ ] Fast version of test forwarding.
-
-## Contant
-Tianhui Song: 191098194@smail.nju.edu.cn
-
-Yutao Cui: cuiyutao@smail.nju.edu.cn 
-
-
-## Citiation
-``` bibtex
-@misc{mixformerv2,
-      title={MixFormerV2: Efficient Fully Transformer Tracking}, 
-      author={Yutao Cui and Tianhui Song and Gangshan Wu and Limin Wang},
-      year={2023},
-      eprint={2305.15896},
-      archivePrefix={arXiv}
-}
-```
+# MixFormerV2
+The official implementation of the NeurIPS 2023 paper: [**MixFormerV2: Efficient Fully Transformer Tracking**](https://arxiv.org/abs/2305.15896).
+
+## Model Framework
+![model](tracking/model.png)
+
+## Distillation Training Pipeline
+![training](tracking/training.png)
+
+
+## News
+
+- **[Sep 22, 2023]** MixFormerV2 is accpeted by **NeurIPS 2023**! :tada:
+
+- **[May 31, 2023]** We released two versions of the pretrained model, which can be accessed on [Google Driver](https://drive.google.com/drive/folders/1soQMZyvIcY7YrYrGdk6MCstTPlMXNd30?usp=sharing).
+
+- **[May 26, 2023]** Code is available now!
+
+
+## Highlights
+
+### :sparkles: Efficient Fully Transformer Tracking Framework
+
+MixFormerV2 is a well unified fully transformer tracking model, without any dense convolutional operation and complex score prediction module. We propose four key prediction tokens to capture the correlation between target template and search area.
+
+### :sparkles: A New Distillation-based Model Reduction Paradigm
+
+To further improve efficiency, we present a new distillation paradigm for tracking model, including dense-to-sparse stage and deep-to-shallow stage.
+
+### :sparkles: Strong Performance and Fast Inference Speed
+
+MixFormerV2 works well for different benchmarks and can achieve **70.6%** AUC on LaSOT and **57.4%** AUC on TNL2k, while keeping 165fps on GPU. To our best knowledge, MixFormerV2-S is the **first** transformer-based one-stream tracker which achieves real-time running on CPU.
+
+
+## Install the environment
+Use the Anaconda
+``` bash
+conda create -n mixformer2 python=3.6
+conda activate mixformer2
+bash install_requirements.sh
+```
+
+## Data Preparation
+Put the tracking datasets in ./data. It should look like:
+```
+   ${MixFormerV2_ROOT}
+    -- data
+        -- lasot
+            |-- airplane
+            |-- basketball
+            |-- bear
+            ...
+        -- got10k
+            |-- test
+            |-- train
+            |-- val
+        -- coco
+            |-- annotations
+            |-- train2017
+        -- trackingnet
+            |-- TRAIN_0
+            |-- TRAIN_1
+            ...
+            |-- TRAIN_11
+            |-- TEST
+```
+
+## Set project paths
+Run the following command to set paths for this project
+```
+python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .
+```
+After running this command, you can also modify paths by editing these two files
+```
+lib/train/admin/local.py  # paths about training
+lib/test/evaluation/local.py  # paths about testing
+```
+
+## Train MixFormerV2
+
+Training with multiple GPUs using DDP. 
+You can follow instructions (in Chinese now) in [training.md](tutorials/training_zh.md).
+Example scripts can be found in `tracking/train_mixformer.sh`.
+
+``` bash
+bash tracking/train_mixformer.sh
+```
+
+## Test and evaluate MixFormerV2 on benchmarks
+- LaSOT/GOT10k-test/TrackingNet/OTB100/UAV123/TNL2k. More details of test settings can be found in `tracking/test_mixformer.sh`.
+
+``` bash
+bash tracking/test_mixformer.sh
+
+```
+
+
+## TODO
+- [x] Progressive eliminating version of training.
+- [ ] Fast version of test forwarding.
+
+## Contant
+Tianhui Song: 191098194@smail.nju.edu.cn
+
+Yutao Cui: cuiyutao@smail.nju.edu.cn 
+
+
+## Citiation
+``` bibtex
+@misc{mixformerv2,
+      title={MixFormerV2: Efficient Fully Transformer Tracking}, 
+      author={Yutao Cui and Tianhui Song and Gangshan Wu and Limin Wang},
+      year={2023},
+      eprint={2305.15896},
+      archivePrefix={arXiv}
+}
+```
diff --git a/experiments/mixformer2_vit/student_288_depth12.yaml b/experiments/mixformer2_vit/student_288_depth12.yaml
@@ -44,7 +44,7 @@ MODEL:
     DEPTH: 12
     MLP_RATIO: 4
     PRETRAINED: True
-    PRETRAINED_PATH: './models/mae_pretrain_vit_base.pth'  #'/data0/cyt/experiments/trackmae/models/mae_pretrain_vit_base.pth'
+    PRETRAINED_PATH: './models/mae_pretrain_vit_base.pth'
   HEAD_TYPE: MLP
   HIDDEN_DIM: 768
   PREDICT_MASK: false
@@ -68,6 +68,7 @@ TRAIN:
     DECAY_RATE: 400
   VAL_EPOCH_INTERVAL: 5
   WEIGHT_DECAY: 0.0001
+  FIND_UNUSED_PARAMETERS: false
 TEST:
   EPOCH: 500
   SEARCH_FACTOR: 4.5

diff --git a/experiments/mixformer2_vit/teacher_288_depth12.yaml b/experiments/mixformer2_vit/teacher_288_depth12.yaml
@@ -1,42 +1,3 @@
-DATA:
-  MAX_SAMPLE_INTERVAL: 200
-  MEAN:
-  - 0.485
-  - 0.456
-  - 0.406
-  SEARCH:
-    CENTER_JITTER: 4.5
-    FACTOR: 5.0 #4.5
-    SCALE_JITTER: 0.5
-    SIZE: 288
-  STD:
-  - 0.229
-  - 0.224
-  - 0.225
-  TEMPLATE:
-    CENTER_JITTER: 0
-    FACTOR: 2.0
-    SCALE_JITTER: 0
-    SIZE: 128
-    NUMBER: 2
-  TRAIN:
-    DATASETS_NAME:
-    - GOT10K_vottrain
-    - LASOT
-    - COCO17
-    - TRACKINGNET
-    DATASETS_RATIO:
-    - 1
-    - 1
-    - 1
-    - 1
-    SAMPLE_PER_EPOCH: 60000
-  VAL:
-    DATASETS_NAME:
-    - GOT10K_votval
-    DATASETS_RATIO:
-    - 1
-    SAMPLE_PER_EPOCH: 10000
 MODEL:
   VIT_TYPE: base_patch16
   FEAT_SZ: 72
@@ -47,36 +8,4 @@ MODEL:
     PRETRAINED_PATH: './models/mae_pretrain_vit_base.pth'  #'/data0/cyt/experiments/trackmae/models/mae_pretrain_vit_base.pth'
   HEAD_TYPE: MLP
   HIDDEN_DIM: 768
-  PREDICT_MASK: false
-TRAIN:
-  BACKBONE_MULTIPLIER: 0.1
-  BATCH_SIZE: 2  # 8 for 2080ti (maybe 10), 32 for tesla V100(32 G)
-  DEEP_SUPERVISION: false
-  EPOCH: 500
-  IOU_WEIGHT: 2.0
-  GRAD_CLIP_NORM: 0.1
-  L1_WEIGHT: 5.0
-  CORNER_WEIGHT: 5.0
-  FEAT_WEIGHT: 0.0
-  LR: 0.0004
-  LR_DROP_EPOCH: 400
-  NUM_WORKER: 8
-  OPTIMIZER: ADAMW
-  PRINT_INTERVAL: 50
-  SCHEDULER:
-    TYPE: step
-    DECAY_RATE: 400
-  VAL_EPOCH_INTERVAL: 5
-  WEIGHT_DECAY: 0.0001
-TEST:
-  EPOCH: 500
-  SEARCH_FACTOR: 4.5
-  SEARCH_SIZE: 288
-  TEMPLATE_FACTOR: 2.0
-  TEMPLATE_SIZE: 128
-  UPDATE_INTERVALS:
-    LASOT: [200]
-    GOT10K_TEST: [200]
-    TRACKINGNET: [25]
-    VOT20: [10]
-    VOT20LT: [200]
+  PREDICT_MASK: False
diff --git a/experiments/mixformer2_vit_online/224_depth4_mlp1_score.yaml b/experiments/mixformer2_vit_online/224_depth4_mlp1_score.yaml
@@ -51,7 +51,7 @@ MODEL:
   HIDDEN_DIM: 768
   FEAT_SZ: 96
   PREDICT_MASK: false
-  PRETRAINED_STAGE1: True
+  PRETRAINED_STATIC: True
 TRAIN:
   BACKBONE_MULTIPLIER: 0.1
   BATCH_SIZE: 32  # 8 for 2080ti (maybe 10), 32 for tesla V100(32 G)

diff --git a/experiments/mixformer2_vit_online/288_depth8_score.yaml b/experiments/mixformer2_vit_online/288_depth8_score.yaml
@@ -51,7 +51,7 @@ MODEL:
   HIDDEN_DIM: 768
   FEAT_SZ: 96
   PREDICT_MASK: false
-  PRETRAINED_STAGE1: True
+  PRETRAINED_STATIC: True
 TRAIN:
   BACKBONE_MULTIPLIER: 0.1
   BATCH_SIZE: 32  # 8 for 2080ti (maybe 10), 32 for tesla V100(32 G)