This repository demonstrates the use of Low-Rank Adaptation (LoRA) to fine-tune Google's base model for two classification tasks: Food Item Identification and Human Action Identification. Each task is trained and inferred separately using LoRA.
In this task we utilized the Google ViT model google/vit-base-patch16-224-in21k
with around 86M
parameters.
Link to the hugging face repo base model
There are some requirements in order to run the files. Python with version >= 3.8 is required.
Other requirements
transformers
datasets
evaluate
peft
torch
andtorchvision
For the purpose of fine-tuning we used peft
Parameter Efficient Fine-Tuning on two different datasets
food101
Human-Action-Recognition
Refer to the notebook Here
In order to run the inference, a simple Gradio app is implemented. We can choose any model adaptor(food / human) and upload an image to get the classification label.
Refer to the inference.py
and app.py
In order to run the inference run the following code after downloading or cloning the repository.
python app.py