This is a short introduction of the configuration module. As mentioned in the main README, this module provides the flexibility of controlling different parameters in the other modules.
The platform leverages a global_config.yaml
to set a small number of parameters that can be widely applied to multiple models. In addition, each model has its unique config file to enable custom adjustment.
The global_config.yaml
determines the following setups:
all
- applying to all modelsprediction_tasks
: a list that includes all supported model prediction tasksds_keys
: a list of dataset keys involved in the evaluationflag_more_feat_types
: whether to use additional feature types. Currently only can beTrue
whends_keys
only containsINS-W
.
ml
- applying to traditional modelssave_and_reload
: a flag to indicate whether to save and re-use features repetitively (intermediate files will be saved intmp
folder). DefaultFalse
. Be careful when turning this flag on, as it will not update the feature file once it is saved. Set it toTrue
only when re-running the exact same algorithm.
dl
- applying to deep modelsbest_epoch_strategy
: a flag to choose the best training epoch as the final prediction model:direct
oron_test
. When it is set asdirect
, it will use a standard strategy: picking the best training epoch on the validation/training set. When it is set ason_test
, it will use another strategy that involves information leakage. It iterates through all training epochs, and performs the samedirect
strategy at each epoch. Then, the results on the testing set across all epochs are compared to identify the best epoch. The results only indicate whether a model is overfitted, and reflect the theoretical upper bound performance during the training.skip_training
: similar tosave_and_reload
inml
, this is a flag to accelerate the deep model evaluation process. A model's intermediate training epoch results will be saved intmp
folder. When this flag is turned on, the model can leverage the saved results to re-identify the best epoch. A typical usage case: (1) setskip_training
asFalse
andbest_epoch_strategy
asdirect
to go through the training. (2) setskip_training
asTrue
andbest_epoch_strategy
ason_test
to find another epoch without the need to re-train the model.
It is worth noting that global_config.yaml
will overwrite the individual config files on the same items. This can save the effort of changing individual parameters one by one.
Each algorithm can lead to one or more models, and each model is accompanied by one config yaml file with a unique name.
Here is a list of the current supported models:
- Traditional Machine Learning Model
- Canzian et al. -
ml_canzian.yaml
- Saeb et al. -
ml_saeb.yaml
- Farhan et al. -
ml_farhan.yaml
- Wahle et al. -
ml_wahle.yaml
- Lu et al. -
ml_lu.yaml
- Wang et al. -
ml_wang.yaml
- Xu et al. - Interpretable -
ml_xu_interpretable.yaml
- Xu et al. - Personalized -
ml_xu_personalized.yaml
- Chikersal et al. -
ml_chikersal.yaml
- Canzian et al. -
- Deep-learning Model
- ERM
- Mixup -
dl_erm_mixup.yaml
- DANN
- IRM -
dl_irm.yaml
- CSD
- MLDG
- MASF
- Siamese -
dl_siamese.yaml
- Clustering -
dl_clustering.yaml
- Reorder -
dl_reorder.yaml