This is a tensorflow implementation for my Master of Science Thesis:
In this study, a hybrid recommendation system has been proposed that will increase the efficiency of the systems in pharmacies. A new system which called LightGCN+ aims to improve the LightGCN by adding item-item relations next to user-item relations. In addition three datasets have been proposed about drug/preparation.
The code has been tested running under Python 3.6.5. The required packages are as follows:
- tensorflow == 1.11.0
- numpy == 1.14.3
- scipy == 1.1.0
- sklearn == 0.19.1
- cython == 0.29.15
The instruction of commands has been clearly stated in the codes (see the parser function in LightGCN/utility/parser.py).
- Command
python LightGCN.py --dataset Drug-Relation-90 --regs [1e-4] --embed_size 64 --layer_size [64,64,64] --lr 0.001 --batch_size 2048 --epoch 1000
- Example Output log (Not Real):
eval_score_matrix_foldout with cpp
n_users=12765, n_items=8415
n_interactions=33217
n_train=23000, n_test=10217, sparsity=0.00031
...
Epoch 1 [20.3s]: train==[0.46925=0.46911 + 0.00014]
Epoch 2 [25.1s]: train==[0.21866=0.21817 + 0.00048]
...
Epoch 879 [81.6s + 31.3s]: test==[0.13271=0.12645 + 0.00626 + 0.00000], recall=[0.18201], precision=[0.05601], ndcg=[0.15555]
Early stopping is trigger at step: 5 log:0.18201370537281036
Best Iter=[38]@[32829.6] recall=[0.40890], precision=[0.02151], ndcg=[0.20539]
- Command
python LightGCN.py --dataset Drug-Relation-180 --regs [1e-4] --embed_size 64 --layer_size [64,64,64] --lr 0.001 --batch_size 2048 --epoch 1000
- Example Output log (Not Real):
eval_score_matrix_foldout with cpp
n_users=19210, n_items=9793
n_interactions=57763
n_train=27000, n_test=23763, sparsity=0.00031
...
Epoch 1 [20.3s]: train==[0.46925=0.46911 + 0.00014]
Epoch 2 [25.1s]: train==[0.21866=0.21817 + 0.00048]
...
Epoch 879 [81.6s + 31.3s]: test==[0.13271=0.12645 + 0.00626 + 0.00000], recall=[0.18201], precision=[0.05601], ndcg=[0.15555]
Early stopping is trigger at step: 5 log:0.18201370537281036
Best Iter=[38]@[32829.6] recall=[0.48135], precision=[0.02727], ndcg=[0.21539]
- Command
python LightGCN.py --dataset Drug-Relation-270 --regs [1e-4] --embed_size 64 --layer_size [64,64,64] --lr 0.001 --batch_size 2048 --epoch 1000
- Example Output log (Not Real):
eval_score_matrix_foldout with cpp
n_users=22701, n_items=10282
n_interactions=75480
n_train=40000, n_test=35480, sparsity=0.00032
...
Epoch 1 [20.3s]: train==[0.46925=0.46911 + 0.00014]
Epoch 2 [25.1s]: train==[0.21866=0.21817 + 0.00048]
...
Epoch 879 [81.6s + 31.3s]: test==[0.13271=0.12645 + 0.00626 + 0.00000], recall=[0.18201], precision=[0.05601], ndcg=[0.15555]
Early stopping is trigger at step: 5 log:0.18201370537281036
Best Iter=[38]@[32829.6] recall=[0.51632], precision=[0.02945], ndcg=[0.22551]
NOTE : the duration of training and testing depends on the running environment.
We provide three processed datasets: Drug-Relation-90, Drug-Relation-180 and Drug-Relation-270.
-
train.txt
- Train file.
- Each line is a user with her/his positive interactions with items: userID\t a list of itemID\n.
-
test.txt
- Test file (positive instances).
- Each line is a user with her/his positive interactions with items: userID\t a list of itemID\n.
- Note that here we treat all unobserved interactions as the negative instances when reporting performance.
-
item_item.txt
- Item-Item relation file.
- Each line is a item with its positive interactions with other items: itemID\t a list of itemID\n.
-
user_list.txt
- User file.
- Each line is a triplet (org_id, remap_id) for one user, where org_id and remap_id represent the ID of the user in the original and our datasets, respectively.
-
item_list.txt
- Item file.
- Each line is a triplet (org_id, remap_id) for one item, where org_id and remap_id represent the ID of the item in the original and our datasets, respectively.