GitHub - altansnl/exploring-mae-vision-learners

Exploring: Masked Autoencoders Are Scalable Vision Learners

@altansnl, @AGarciaCast, @Frankkie

For our experiments, we will use Tiny ImageNet instead of ImageNet-1K. For our baseline model, we will use ViT-B instead of ViT-Large.
For MAE ablation experiments, we will only use fine-tuning and not use linear probing. We will experiment with decoder-depth, decoder-width, and reconstruction-target; to lesser exhaus- tive extent with respect to the paper (due to time constrains we will not experiment with encoder with mask-tokens).
For comparisons with previous results on Tiny ImageNet, we will point to the respective papers and will not verify their results ourselves.
We will leave partial fine-tuning out-of-scope.
We might experiment with transfer learning for downstream tasks of object detection or a classification task (the amount of experiments will depend on the time and resources constrains).
The author mention that the performance could be improved if non-vanilla ViT models are used. So we could try to compare our results with other variations (not only on depth s.t. ViT-L/H).

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
Literature		Literature
.gitignore		.gitignore
Group6Report.pdf		Group6Report.pdf
README.md		README.md
dataloader.py		dataloader.py
dmaemodel.py		dmaemodel.py
download_tiny_imagenet.sh		download_tiny_imagenet.sh
finetune.py		finetune.py
maemodel.py		maemodel.py
plot_res.py		plot_res.py
pretrain.py		pretrain.py
pretrain_dist.py		pretrain_dist.py
requirements.txt		requirements.txt
requirements_.txt		requirements_.txt
test		test
test.py		test.py
transform.py		transform.py
utils.py		utils.py