- Understanding of the paper Early Visual Concept Learning with Unsupervised Deep Learning
- Understand, implement and compare the architectures of Swin Transformer and ConvNeXT on a non-vanilla classification task of your choosing.
- For clarification, a vanilla classification task is one where you have a labelled set of images on which you directly perform single or multi-class classification.
- Bonus Task: Implementation of the Disentangled VAE (Task 1.1)
- Create folders for each individual in which images, wherein their faces are detected, are stored. (Multiple copies of images obviously now exist as multiple people are in the same photo.) However, there is one addition. You will need to write it in such a way that these folders can be created regardless of the input data (number of people in an image, size of image, etc.).
- The final runnable should just have the user add photos to a folder (no required preprocessing) and run a file that will do everything necessary.
- Ideally, no training should happen at inference time.
python: 3.10
tensorflow: 2.13.0
keras: 2.13.1