Down-Sampling Inter-Layer Adapter for Parameter and Computation Efficient Ultra-Fine-Grained Image Recognition

Official Pytorch code for the paper: Down-Sampling Inter-Layer Adapter for Parameter and Computation Efficient Ultra-Fine-Grained Image Recognition published in ECCV 2024 Workshop on Efficient Deep Learning for Foundation Models (EFM).

We propose a novel down-sampling adapter module inserted in between transformer encoder layers to adapt and down-sample features in the parameter-efficient transfer-learning setting:

This is to alleviate the attention collapse issue observed for ViTs in UFGIR datasets with the PETL settings:

Our method obtains favorable results across ten ultra-fine-grained image recognition (UFGIR) datasets:

Also, our method achieves a superior accuracy vs cost (in terms of total trainable parameters and FLOPs) trade-off compared to alternatives.

Pre-trained checkpoints are available on HuggingFace!

The code for our model (and the ViT backbone) is in fgir_vit/model_utils/modules_others/adapters.py.

Setup

pip install -e .

Preparation

Datasets are downloaded from:

Xiaohan Yu, Yang Zhao, Yongsheng Gao, Xiaohui Yuan, Shengwu Xiong (2021). Benchmark Platform for Ultra-Fine-Grained Visual Categorization BeyondHuman Performance. In ICCV2021.
https://github.com/XiaohanYu-GU/Ultra-FGVC?tab=readme-ov-file

Train

To train a ViT B-16 with ILA++ on CUB using image size 224:

python tools/train.py --serial 1 --cfg configs/soylocal_ft_weakaugs.yaml --model_name vit_b16 --freeze_backbone --classifier cls --adapter adapter --ila --ila_locs --cpu_workers 8 --seed 1 --lr 0.1

Similarly, for image size 448:

python tools/train.py --serial 3 --cfg configs/soylocal_ft_weakaugs.yaml --model_name vit_b16 --freeze_backbone --image_size 448 --classifier cls --adapter adapter --ila --ila_locs --cpu_workers 8 --seed 100 --lr 0.1
python tools/train.py --cfg configs/cub_ft_is224_medaugs.yaml --lr 0.01 --model_name vit_b16 --cfg_method configs/methods/glsim.yaml --image_size 448

Compute CKA Similarity

For frozen vanilla ViT B-16

python -u tools/compute_feature_metrics.py --debugging --serial 32 --cfg configs/soyageing_ft_weakaugs.yaml --model_name vit_b16 --fp16 --compute_attention_cka --ckpt_path ckpts/soyageing_vit_b16_cls_fz_1.pth

For frozen ViT B-16 with our proposed ILA:

python -u tools/compute_feature_metrics.py --debugging --serial 32 --cfg configs/soyageing_ft_weakaugs.yaml --model_name vit_b16 --fp16 --compute_attention_cka --ckpt_path ckpts/soyageing_vit_b16_ila_dso_cls_adapter_fz_1.pth --adapter adapter --ila --ila_locs

Results will be saved in results_inference directory.

Visualize attention

For ViT with ILA to visualize attention rollout for the 1st encoder group (first 4 encoder blocks: 0_4):

python -u tools/vis_dfsm.py --batch_size 8 --vis_cols 8 --vis_mask rollout_0_4 --serial 30 --cfg configs/soyageing_ft_weakaugs.yaml --model_name vit_b16 --fp16 --ckpt_path ../results_ila/serial1_ckpts/soyageing_vit_b16_ila_dso_cls_adapter_fz_1.pth --adapter adapter --ila --ila_locs

Citation

If you find our work helpful in your research, please cite it as:

@misc{rios_down-sampling_2024,
	title = {Down-{Sampling} {Inter}-{Layer} {Adapter} for {Parameter} and {Computation} {Efficient} {Ultra}-{Fine}-{Grained} {Image} {Recognition}},
	doi = {10.48550/arXiv.2409.11051},
	url = {http://arxiv.org/abs/2409.11051},
	publisher = {arXiv},
	author = {Rios, Edwin Arkel and Oyerinde, Femiloye and Hu, Min-Chun and Lai, Bo-Cheng},
	month = sep,
	year = {2024},
	note = {arXiv:2409.11051 [cs]},
	keywords = {Computer Science - Computer Vision and Pattern Recognition, I.2, I.4},
	annote = {Comment: Accepted to ECCV 2024 Workshop on Efficient Deep Learning for Foundation Models (EFM). Main: 13 pages, 3 figures, 2 tables. Appendix: 3 pages, 1 table. Total: 16 pages, 3 figures, 4 tables},
}

Acknowledgements

We thank NYCU's HPC Center and National Center for High-performance Computing (NCHC) for providing computational and storage resources.

We thank the authors of TransFG, FFVT, SimTrans, CAL, MPN-COV, VPT, VQT, ConvPass and timm for providing implementations for comparison. We also thank the authors of the Ultra-FGVC datasets.

Also, Weight and Biases for their platform for experiment management.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
configs		configs
data		data
fgir_vit		fgir_vit
samples		samples
scripts		scripts
tools		tools
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Down-Sampling Inter-Layer Adapter for Parameter and Computation Efficient Ultra-Fine-Grained Image Recognition

Setup

Preparation

Train

Compute CKA Similarity

Visualize attention

Citation

Acknowledgements

About

Releases

Packages

Languages

arkel23/DownSamplingInterLayerAdapter

Folders and files

Latest commit

History

Repository files navigation

Down-Sampling Inter-Layer Adapter for Parameter and Computation Efficient Ultra-Fine-Grained Image Recognition

Setup

Preparation

Train

Compute CKA Similarity

Visualize attention

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages