As part of NTU's CZ/CE4041 Machine Learning Module, the team was tasked to compete in a Kaggle Competition (closed). The team chose to do the Northeastern SMILE Lab - Recognizing Faces in the Wild. The aim of the competition is to determine the probability of kinship given two images. In summary, our group used the siamese network to help achieve a score of 0.907 on the Kaggle Public Leaderboard.
Sankar Samiksha
Jia Min
Xing Kun
Fan Yupei
Training images
Training relationships CSV
Test images
Test sample submission csv:
For monitoring all other submissions
Randomized data set
Randomizing the images obtained allows us to have a different variety of images to train on.
trainData = []
targetRelatedCount = 36000 #32k data
relatedCount = 0
while relatedCount<targetRelatedCount:
for k,v in relationshipDict.items():
for relation in v:
i2 = random.randint(0, len(personPathFile[k])-1) #random photo of person1
i3 = random.randint(0, len(personPathFile[relation])-1) #random photo of person2
trainData.append((personPathFile[k][i2], personPathFile[relation][i3], 1))
if relatedCount>=targetRelatedCount:
if relatedCount>=targetRelatedCount:
trainData = set(trainData)
trainData = list(trainData)
positiveRelationsCount = len(trainData)
print("Current Length of positive relationships: ", len(trainData))
#making non-relationships more random, with same length as trainData
notRelationAddedCount = 0
#might choose the same relation but handled later on when convert trainData to set and back to list
while notRelationAddedCount<positiveRelationsCount:
for k,v in notRelationshipDict.items():
i1 = random.randint(0, len(v)-1)
i2 = random.randint(0, len(personPathFile[k])-1) #random photo of person1
i3 = random.randint(0, len(personPathFile[v[i1]])-1) #random photo of person2
trainData.append((personPathFile[k][i2], personPathFile[v[i1]][i3], 0))
if notRelationAddedCount>=positiveRelationsCount:
print("Current Length of not relationships: ", notRelationAddedCount)
print("Current Length of total relationships: ", len(trainData))
#change trainData to set, then back to list
trainData = set(trainData)
trainData = list(trainData)
print("Current Length of total relationships: ", len(trainData))
Fully Connected Layers
Changing the fully connected layers allows us to have more flexibility and control over the model whilst still using a pre trained model. The fully connected layers that are important are the DropOut Layer and the BatchNormId.
model.classifier = nn.Sequential(
nn.Dropout(0.55), # Add dropout for regularization
nn.Linear(2048, 512),
nn.Linear(512, 256),
nn.BatchNorm1d(256), # Apply batch normalization
nn.Linear(256, 2)
Pre-trained Model, Learning Rate, Adam Optimizer and Loss Criterion
A pre-trained model was used. Facenet has been trained on the vggface2 image dataset.
# Create the Siamese network
net = SiameseNetwork(InceptionResnetV1(pretrained='vggface2', classify=False)).cuda()
# Define the contrastive loss
criterion = nn.CrossEntropyLoss()
# Define the optimizer (e.g., Adam)
optimizer = optim.Adam(net.parameters(), lr=0.0005)
Constantly changing data augmentation
A constantly changing data augmentation was implemented by the team. It is unique
and has not been used by those in the competition.
if epoch % 10 == 0 or epoch %10 == 1 or epoch %10 == 2 or epoch %10 == 3:
print("Data Augmentation: None")
trainloader = createTrain([transforms.Resize((IMG_SIZE,IMG_SIZE)),transforms.ToTensor()])
elif epoch %10 == 4 or epoch %10 == 5:
print("Data Augmentation: RandomGrayScale(0.5)")
trainloader = createTrain([transforms.Resize((IMG_SIZE,IMG_SIZE)),transforms.RandomGrayscale(p=0.5),transforms.ToTensor()])
elif epoch %10 == 6 or epoch %10 == 7:
print("Data Augmentation: RandomCrop((90,90)),RandomGrayScale(0.8), RandomHorizontalFlip, GaussianBlur(kernel_size = 5, sigma=(0.1, 3.0)")
trainloader = createTrain([transforms.RandomCrop((80,80)),transforms.Resize((IMG_SIZE,IMG_SIZE)),transforms.RandomGrayscale(p=0.8),transforms.RandomHorizontalFlip(),transforms.GaussianBlur(kernel_size = 5, sigma=(0.1, 3.0)),transforms.ToTensor()])
elif epoch %10 == 8 or epoch %10 == 9:
print("Data Augmentation: RandomGrayScale(0.8), RandomHorizontalFlip, ColorJitter(brightness=0.7, contrast=0.3),")
trainloader = createTrain([transforms.Resize((IMG_SIZE,IMG_SIZE)),transforms.RandomGrayscale(p=0.5),transforms.RandomHorizontalFlip(),transforms.ColorJitter(brightness=0.7, contrast=0.3),transforms.ToTensor()])
print("Data Augmentation: None")
trainloader = createTrain([transforms.Resize((IMG_SIZE,IMG_SIZE)),transforms.ToTensor()])
Facenet Version 4 (0.807 @ 30 Epochs)
- Add a lot more data to become 59K
- The data added is more within the same family. Meaning, the images are family but not kin (this data predominates the whole thing)
- The fully connected layers after the FaceNet convolutional layer has a dropout layer at 0.7 to prevent overfitting and batchnorm as well
- Every epoch, data augmentation is changed to introduce more variation to prevent overfitting
- Grayscale is important because some of the test data is grayscale. blur is important because some of the data is highly blurred. random horizontal flip to increase variation
- learning rate is set at 0.005 batch size is 64
Facenet Version 5 (0.867 @ 30 Epochs)
- Updated version from version 4
- Randomized data set
Facenet Version 7 (0.907 @ 30 Epochs)
- Changing dropout layer from 0.7 to 0.55
- Adding data augmentation of cropping as there are images with two faces in them, some faces are obscured by sunglasses, or other accessories.
- Adding data augmentation of color jittering as the images are of different brightness and contrast
- Randomizing the non-relationship dataset
- All relationships in the CSV however the pictures selected was randomized
- Submitted CSV