CheXNet

CheXNet Replication and Improvment experiments for CS 598 Deep Learning for Healthcare

This project took ChestX-ray 14 image dataset and predicted probabilities for 14 types of chest diesease.

Dataset

The ChestX-ray 14 dataset contains 112,120 chest X-ray images of 30,805 unique patients with 14 disease labels. As per the original work, we roughly split the dataset into training set (70%), validation set (10%) and test set (20%), with no patient overlaps between dataset partitions.

Directory Structure

/preprocess:

data_label.py - convert labels into multihot label vector
data_resize.py - resize original images
data_split.py - partition dataset into train, validation, test
data_unzip.py - automate dataset tarball unzip
sample.py - take sample data for small scale test

/TrainedModel:

BestModel_AUROC_0.8446.pth - the model state dict for the best model
BestModel_runlog.txt - the log for the best model progress
ReplicationModel_AUROC_0.8159.pth - the model state dict for our replication of CheXNet
ReplicationModel_runlog.txt - the log for the replication progress

final_test.txt - test set: filename, label vector

final_train.txt - train set: filename, label vector

final_val.txt - validation set: filename, label vector

chexnet_cuda_replication.py - the main program, default to replicate CheXNet

Experiment design

The primary goal of this project is to replicate CheXNet.

The options marked default were aligned with the original paper.

Other options were the experiments we did, to investigate the effects of these hyperparameters/factors.

Model Component	Variants
preprocess step	option 1(default): resize to 224×224 with normalized based on ImageNet
data augmentation	option 1: raw (224×224) only option 2(default): raw (224×224) with random horizontal flip option 3: raw (256×256) with (horizontally flip + randomly crop) (limit crop size to (224×224))
backbone	option 1(default): DenseNet121 option 2: MobileNetV2 option 3: MobileNetV3-Large option 4: DenseNet169 option 5: ResNet18
batch size	option 1(default): 16 option 2: 32 option 3: 64
Initial Weights	option 1(default): ImageNet
optimizer	option 1(default): Adam (1 = 0.9 and 2 = 0.999)
Initial learning rate	option 1(default): initial value = 0.001 option 2: initial value = 0.01 option 3: initial value = 0.0005 option 4: initial value = 0.0001 option 5: initial value = 0.00005
learning rate decay factor	option1(default): 10

Model comparison

	Wang et al. (2017)	Yao et al. (2017)	CheXNet	Our Best Model
Atelectasis	0.716	0.772	0.8094	0.8274
Cardiomegaly	0.807	0.904	0.9248	0.9130
Effusion	0.784	0.859	0.8638	0.8799
Infiltration	0.609	0.695	0.7345	0.7181
Mass	0.706	0.792	0.8676	0.8667
Nodule	0.671	0.717	0.7802	0.7931
Pneumonia	0.633	0.713	0.768	0.7414
Pneumothorax	0.806	0.841	0.8887	0.8886
Consolidation	0.708	0.788	0.7901	0.8269
Edema	0.835	0.882	0.8878	0.8848
Emphysema	0.815	0.829	0.9371	0.9336
Fibrosis	0.769	0.767	0.8047	0.8194
Pleural_Thickening	0.708	0.765	0.8062	0.8115
Hernia	0.767	0.914	0.9164	0.9198
Average AUROC	0.738	0.803	0.8414	0.8446

Prerequisite

Python 3.7+
PyTorch 1.8.1
Numpy
sklearn
cuda

Usage

Download the dataset (/images), dataset partition list (train_val_list.txt, test_list.txt) and labels (Data_Entry_2017_v2020.csv) from ChestXray-NIHCC (find the README file in it helpful)
Unzip the tarballs using data_unzip.py
(Optional) Resize the images using data_resize.py
Split dataset into train, validation, test set using data_split.py
Generate (filename, label vector) tuple list by data_label.py
Specify image folder path, i.e.

DATA_PATH = './images_converted256/'

Specify interested model (for the primary goal of this project)

model = DenseNet121(N_LABEL).cuda()

(Optional) Load saved model

model.load_state_dict(torch.load("ReplicationModel_AUROC_0.8159.pth")

(Optional) Comment out the trainning step to verify the model against the test set

# train_model(model, train_loader, val_loader, N_EPOCH, logfile)

Run

python chexnet_cuda_replication.py

Contributor

Xi Li, Yu Liu, Liping Xie, Yekai Yu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CheXNet

Dataset

Directory Structure

Experiment design

Model comparison

Prerequisite

Usage

Contributor

Files

README.md

Latest commit

History

README.md

File metadata and controls

CheXNet

Dataset

Directory Structure

Experiment design

Model comparison

Prerequisite

Usage

Contributor