The model is trained on popular INCLUDE Dataset created by AI4Bharat, IIT Madras. The architecture is a combination of EfficientNet + Bert Encoder layer which contains self attention mechanism. The model is able to obtain a Test Accuracy of 91.17%. When using VGG16 + Bert Encoder layer, we obtain an accuracy of 92.7%. The git repository provides implementation with EfficientNet + Bert Encoder.