This is the PyTorch implementation of our paper:
Plug-and-play Rating Prediction Adjustment through Trisecting-acting-outcome
- numpy=1.21.5
- pandas=1.3.5
- python=3.7.16
- pytorch=1.13.1
Dataset information
Dataset | Users | Items | Interactions | Link |
---|---|---|---|---|
Amazon-Music | 478 235 | 266 414 | 836 006 | URL |
Ciao | 17 615 | 16 121 | 72 665 | URL |
Douban-Book | 46 548 | 212 995 | 1 908 081 | URL |
Douban-Movie | 94 890 | 81 906 | 11 742 260 | URL |
MovieLens-1M | 6 040 | 3 900 | 1 000 209 | URL |
MovieLens-10M | 69 878 | 10 677 | 10 000 054 | URL |
Pre-trained model information
Method | MF-based | LightGCN-based | ItemCF-based |
---|---|---|---|
MF | ✅ | ||
LightGCN | ✅ | ||
ItemCF | ✅ | ||
IPS | ✅ | ||
DICE | ✅ | ||
PDA | ✅ | ✅ | |
TIDE | ✅ | ✅ | |
PARA | ✅ | ✅ | ✅ |
Take the Ciao dataset as an example:
-
Clone the source code
git clone https://github.com/A-Egoist/TWDP.git --depth=1
-
Download preprocessed data and pre-trained models Download the data and models from Baidu Netdisk.
-
Each original dataset is split into 5 sets for five-fold cross-validation, consisting of
*.train
,*.test
, and*.extend
files:*.train
: used for training.*.test
: used for testing.*.extend
: includes(user, positiveItem, negativeItem)
data generate by performing negative sampling on the*.train
file.- Additionally, the
*.pkl
files store the processed results of the*.train
files.
-
The pre-trained model files follow the naming convention
backbone-method-dataset-fold_index
. For example, for the PARA method with MF as the backbone model on the Ciao dataset, the models are named as:MF-PARA-Ciao-1.pt
MF-PARA-Ciao-2.pt
MF-PARA-Ciao-3.pt
MF-PARA-Ciao-4.pt
MF-PARA-Ciao-5.pt
These files correspond to models trained on different folds for the five-fold cross-validation experiment.
-
To reproduce the results of PARA method with MF as the backbone model on the Ciao dataset, ensure the dataset contains the following files:
movie-ratings1.test
,movie-ratings1.pkl
movie-ratings2.test
,movie-ratings2.pkl
movie-ratings3.test
,movie-ratings3.pkl
movie-ratings4.test
,movie-ratings4.pkl
movie-ratings5.test
,movie-ratings5.pkl
-
Additionally, download the corresponding pre-trained models:
MF-PARA-ciao-1.pt
MF-PARA-ciao-2.pt
MF-PARA-ciao-3.pt
MF-PARA-ciao-4.pt
MF-PARA-ciao-5.pt
-
-
Confirm the file structure Verify the downloaded files and folder structure according to the provided
tree.txt
file. -
Run inference To evaluate the model, run the following command:
# Windows python .\main.py --backbone MF --method PARA --dataset ciao --mode eval
Avaliable Options:
--backbone
: The backbone model. Available options:['MF', 'LightGCN']
.--method
: The method to be used. Available options:['Base', 'IPS', 'DICE', 'PDA', 'TIDE', 'PARA']
.--dataset
: The dataset to use. Available options:['amzoun-music', 'ciao', 'douban-book', 'douban-movie', 'ml-1m', 'ml-10m']
.--mode
: The mode to be choosen. Available options:['train', 'eval', 'both']
.
-
Convert the results to Excel After evaluation, convert the log results into an Excel file using the following command:
# Windows python .\logs\log_to_excel.py --input .\logs\eval.py --output .\output\eval.xlsx --sl 1 --el 2000
Explanation of Parameters:
--input
: Specifies the log file that contains the evaluation results (e.g.,eval.py
).--output
: Specifies the base name of the output Excel file (e.g.,eval.xlsx
). The actual output files will be saved as three separate files:eval-mean.xlsx
: Contains the mean of the evaluation metrics.eval-std.xlsx
: Contains the standard deviation of the evaluation metrics.eval-rank.xlsx
: Contains the ranking of the evaluation metrics.
--sl
: Specifies the start line in the log file from which the results will be converted to Excel.--el
: Specifies the end line in the log file up to which the results will be converted to Excel.
This section explains how to reproduce the results from scratch, taking the Ciao dataset as an example:
-
Clone source code and datasets Clone the repository containing the code:
git clone https://github.com/A-Egoist/TWDP.git --depth=1
Download the Ciao dataset from URL and move it into the corresponding folder according to
tree.txt
-
Data preprocessing
(a). Split the dataset into 5 sets
Run the following command to split the Ciao dataset into 5 subsets for five-fold cross-validation:
# Windows python .\src\data_processing.py --dataset ciao
Available Option:
--dataset
: The dataset to be splited. Available options:['amzoun-music', 'ciao', 'douban-book', 'douban-movie', 'ml-1m', 'ml-10m']
.
(b). Compile the negative sampling script
Use the following command to compile the C++ script for negative sampling:
# Windows g++ .\src\negative_sampling.cpp -o .\src\negative_sampling.exe
(c). Perform negative sampling
Execute the compiled script to perform negative sampling:
# Windows .\src\negative_sampling.exe ciao 1
Explanation of Parameters:
- The first parameter specifies the dataset to be processed, with available options:
['amazon-music', 'ciao', 'douban-book', 'douban-movie', 'ml-1m', 'ml-10m']
- The second parameter specifies the fold index for the cross-validation dataset, with options:
['1', '2', '3', '4', '5']
-
Training To start the training process, run the following command:
# Windows python .\main.py --backbone MF --method PARA --dataset ciao --mode train
Available Options:
--backbone
: The backbone model. Available options:['MF', 'LightGCN']
.--method
: The method to be used. Available options:['Base', 'IPS', 'DICE', 'PDA', 'TIDE', 'PARA']
.--dataset
: The dataset to use. Available options:['amzoun-music', 'ciao', 'douban-book', 'douban-movie', 'ml-1m', 'ml-10m']
.--mode
: The mode to be chosen. Available options:['train', 'eval', 'both']
.
-
Evaluation After training, evaluate the model's performance using this command:
# Windows python .\main.py --backbone MF --method PARA --dataset ciao --mode eval
Options are the same as step 3.
-
Convert the results to Excel Finally, convert the evaluation logs to an Excel file:
# Windows python .\logs\log_to_excel.py --input .\logs\eval.py --output .\output\eval.xlsx --sl 1 --el 2000
Explanation of Parameters:
--input
: Specifies the log file that contains the evaluation results (e.g.,eval.py
).--output
: Specifies the base name of the output Excel file (e.g.,eval.xlsx
). The actual output files will be saved as three separate files:eval-mean.xlsx
: Contains the mean of the evaluation metrics.eval-std.xlsx
: Contains the standard deviation of the evaluation metrics.eval-rank.xlsx
: Contains the ranking of the evaluation metrics.
--sl
: Specifies the start line in the log file from which the results will be converted to Excel.--el
: Specifies the end line in the log file up to which the results will be converted to Excel.
If you use this code, please cite the following paper:
@article{ZhangLong2024PARA,
title = {Plug-and-play Rating Prediction Adjustment through Trisecting-acting-outcome},
author = {},
journal = {},
year = {},
volume = {},
number = {},
pages = {},
doi = {}
}