-
Notifications
You must be signed in to change notification settings - Fork 62
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The pseudo code of our implementation, hope will help other users to understand and learn LambdaMART better.
- Loading branch information
1 parent
494e314
commit 4c5154a
Showing
1 changed file
with
139 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
Algorithm Pseudo Code | ||
|
||
|
||
The main part is in the regression tree python file and there are 2 separate parts in the regression tree part. | ||
Part 1: Build a tree | ||
Part 2: Predict the data based on the current tree model and validate it | ||
|
||
1 Regression Tree Model | ||
|
||
|
||
Function get_slittting_points | ||
Iterate through the attribute value of the parameter. | ||
Sort the attribute value array in ascending order. | ||
For index from 0 until array.length-1 | ||
if(array[index) not equals array[index+1] | ||
Add median value to the array | ||
Return the array. | ||
|
||
|
||
Function find_best_split_parallel | ||
Best_least_square = infinity | ||
For i in all possible splits | ||
Children = split_children(data, i,) | ||
Least_square = least_square(left) + least_square(right) | ||
If Least_square < Best_least_square | ||
Best_least_square = Least_square | ||
Assign best_children | ||
Assign best_split_point | ||
|
||
|
||
Function split_children | ||
Left_index = all indexes which value is <split | ||
Right_index = all indexes which value is >=split | ||
Left_label = all labels which index is contained in Left_index | ||
Right_Label = all labels which index is contained in right index | ||
Return Left_index, Right_index, Left_label, Right_label | ||
|
||
|
||
|
||
|
||
Function create tree(data, all_pos_split, max_depth, curr_depth, ideal_ls) | ||
// checking two stopping conditions | ||
All_features = all_pos_split | ||
Condition1 | ||
sum(all features) is 0 , | ||
Ending because there is no more features | ||
Condition2 | ||
End when curr_depth > max_depth | ||
For j in all_pos_split | ||
Get min_split which creates min_error | ||
Children = split_children(min_split) | ||
// make the recursive functions call on create_tree | ||
create_tree(children.left_data…) | ||
create_tree(children.right_data…) | ||
|
||
|
||
|
||
|
||
Function make prediction(curr_node, data) | ||
// base case here | ||
If curr_node is leaf | ||
Return its prediction | ||
Get the attribute value from the data | ||
If attribute_value < split_vaule | ||
Choose left path | ||
Else | ||
Choose right path | ||
|
||
|
||
|
||
|
||
2. Using tree model to build Lambdamart model. | ||
Lambdamart python file | ||
|
||
|
||
Function dcg(scores) | ||
Pass an array as parameter, | ||
Iterate each entry of the array, get the value of 2 to the power of entry value, divide by the log likelihood of number of index. | ||
Add all values together return it. | ||
|
||
|
||
Function dcg_pred(scores) | ||
Temp_a = np.power(2, scores[i]-1/ np.log(i+2) for the top 10 values and add them up | ||
|
||
|
||
Function dcg_pred(scores, i, g, idcg) | ||
// calculate the percentage we could improve by swapping 2 values | ||
//idcg is the ideal dcg value here | ||
Swap the values of index i and j in the array. | ||
Old_Dcg =dcg(scores) without swapping | ||
New_Dcg = dcg(scores) after applying swapping | ||
Return (new_dcg- old_dcg)/idcg | ||
|
||
|
||
Function single_dcg(scores, i, j) | ||
For 2 different index i and j of scores array. | ||
Return (np.power(2, scores[i]) - 1) / np.log2(j + 2) | ||
|
||
|
||
Function lambda_parallel(args) which args include | ||
Calculate the delta_ndcg according to the passed array and i, j index | ||
Then calculate the lambda value based on the delta_ndcg | ||
|
||
|
||
Function compute_lamda(args): | ||
Core function | ||
Based on the passed parameter true scores, predicted scores, | ||
First using numpy argsort to retrieve the corresponding index of the sorted array of both true value and our predicted value. | ||
For all the i,j pairs in good_ij_pairs | ||
Calculate single_dcg for each pair and assign key with (i,j) and single_dcg(i,j) as the value, store the pair in the dictionary. | ||
For all the i,j pairs in good_ij_pairs | ||
Calculate lambda value based on the difference between i,j index and z_ndcg value of (i,i) and (i,j) | ||
|
||
|
||
Function constructor of class LambdaMart | ||
LambdaMart has its own training data, its own defined number of trees, leaves per tree and its learning rate. | ||
|
||
|
||
Function predict | ||
Iterate through each query indexes | ||
Use the regression tree predict function to predict the training data of each entry | ||
Add the predicted value together and assign it to the predicted scores which we return from the function | ||
|
||
|
||
Function validate | ||
Predict the testing data based on the tree build from training data | ||
|
||
|
||
Function save and load | ||
Save the Lambdamart model in the file by dumping it and load it every time | ||
|
||
|
||
Main function | ||
Read our training data from the file | ||
Build the Lambdamart based on training data and learning rate by calling the constructor | ||
Calculate the average dcg score by validating the testing data. | ||
|
||
|
||
3. Implementation Challenge |