Dept. of Statistic, Applied Math. and Computing, Universidade Estadual Paulista (UNESP), Rio Claro, Brazil
- Overview
- Requirements
- Installation
- Execution
- Results
- Visualization
- Contributing
- Cite
- Contact
- Acknowledgments
- License
This framework implements the approach proposed in the paper ''Weakly Supervised Learning through Rank-based Contextual Measures''. It proposes a rank-based model to exploit contextual information encoded in the unlabeled data in order to perform label expansion and execute a weakly supervised classification.
Available Classifiers:
- Traditional kNN
- Support Vector Machines (SVM)
- Optimum-path Forest (OPF)
- Graph Convolutional Network (GCN)
Available Correlation Measures:
- Intersection
- Spearman
- RBO
- Jaccard
- Jaccard K
- Kendall Tau
- Kendall Tau W
This software runs in any computer and OS that supports Python 3.7. A Nvidia GPU is recommended for experiments with Graph Convolutional Networks (GCN), but it is not essential.
Python 3.7 is recommended for the WSEF. We recommend the user to create a virtual environment.
If you are using a GNU/Linux OS, all the requirements can be installed using the install_dependencies.sh script. You can follow the commands:
chmod +x install_dependencies.sh
./install_dependencies.sh
By default, the CPU version of torch-geometric is installed. If you wish, you can change it to GPU by changing the word cpu in lines 15 and 16 to your corresponding NVIDIA CUDA version (cu101, cu102, cu111).
You can install the main requirements using the command:
pip install -r requirements.txt
Torch-geometric is also required as a dependence for running the GCN classifier. Both torch-geometric 1.8 and 1.9 are compatible. You can install it following the instructions in the official torch-geometric repository or executing the following commands (CPU version):
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cpu.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.9.0+cpu.html
pip install torch-geometric
Finally, install PyOPF following the instructions in the official repository.
WSEF is capable of executing multiple experiments in different datasets. The config.py file specifies the execution parameters. In the beginning of the file, the datasets list defines the datasets that are going to be used. A dictionary is defined to specify the parameters of each dataset. The users can change the parameters according to their needs.
Some notes:
- The thresholds were defined according to the experiments presented in the paper.
- For more information about the available datasets, consult this page.
# list of datasets to run execution
datasets = ["flowers"]
# init dictionary of parameters
dataset_settings = dict()
# flowers dataset parameters
dataset_settings["flowers"] = { "descriptors": ["resnet"],
"classifiers": ["opf", "svm", "knn", "gcn"],
"correlation_measures": ["intersection", "jaccard", "jaccard_k", "kendalltau", "rbo", "spearman"],
"thresholds": {"intersection": 0.15,
"jaccard": 0.45,
"jaccard_k": 0.30,
"kendalltau": 0.55,
"rbo": 0.20,
"spearman": 0.55},
"top_k": 80,
"dataset_size": 1360,
"L": 400,
"n_executions": 1,
"n_folds": 10,
}
After the configuration is done, the execution can be performed by running:
python wsef.py
All the generated results are stored in the results directory. The text files contain a report of the executions for each dataset and classifier. For example:
Classifier: opf
Dataset: flowers
Folds: 10
n_executions: 5
**********
Descriptor: resnet
**********
Running 5 times without training set expansion...
Mean acc = 71.77%
****
Correlation Measure: rbo / top_k: 80 / th: 0.2
Running 5 times with training set expansion...
Threshold = 0.2
Mean acc = 81.08%
Relative Gain = 12.97%
Absolute Gain = 9.31%
****
Correlation Measure: jaccard / top_k: 80 / th: 0.45
Running 5 times with training set expansion...
Threshold = 0.45
Mean acc = 75.64%
Relative Gain = 5.38%
Absolute Gain = 3.86%
**********
Plots are generated in the results/plots directory. The plots show the distribution of elements after the UMAP method is applied to the features. The colored dots are the ones where the labels are known. Elements of the same class are labeled with the same color. Some examples are shown for the Oxford17Flowers dataset.
- Distribution of labeled and unlabeled data before label expansion.
- Distribution of labeled and unlabeled data after the label expansion of our approach.
- Real dataset labels.
We appreciate suggestions, ideas and contributions. If you want to contribute, feel free to contact us. Github pull requests should be avoided because they are not part of our review process. To report small bugs, you can use the issue tracker provided by GitHub.
If you use this software, please cite
@inproceedings{paperWSEF,
author={Presotto, João Gabriel Camacho and Valem, Lucas Pascotti and de Sá, Nikolas Gomes and Pedronette, Daniel Carlos Guimarães and Papa, João Paulo},
booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
title={Weakly Supervised Learning through Rank-based Contextual Measures},
year={2021},
volume={},
number={},
pages={5752-5759},
doi={10.1109/ICPR48806.2021.9412596},
}
Lucas Pascotti Valem: lucaspascottivalem@gmail.com
or lucas.valem@unesp.br
João Gabriel Camacho Presotto: joaopresotto@gmail.com
Daniel Carlos Guimarães Pedronette: daniel.pedronette@unesp.br
The authors are grateful to São Paulo Research Foundation - FAPESP (grants 2020/11366-0, 2019/04754-6, and 2017/25908-6).
This project is licensed under GPLv2. See details.