Skip to content


Algorithm Pseudo Code
Browse files Browse the repository at this point in the history
The pseudo code of our implementation, hope will help other users to understand and learn LambdaMART better.
  • Loading branch information
avoca-dorable authored Dec 15, 2016
1 parent 494e314 commit 4c5154a
Showing 1 changed file with 139 additions and 0 deletions.
139 changes: 139 additions & 0 deletions Algorithm Pseudo Code.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
Algorithm Pseudo Code

The main part is in the regression tree python file and there are 2 separate parts in the regression tree part.
Part 1: Build a tree
Part 2: Predict the data based on the current tree model and validate it

1 Regression Tree Model

Function get_slittting_points
Iterate through the attribute value of the parameter.
Sort the attribute value array in ascending order.
For index from 0 until array.length-1
if(array[index) not equals array[index+1]
Add median value to the array
Return the array.

Function find_best_split_parallel
Best_least_square = infinity
For i in all possible splits
Children = split_children(data, i,)
Least_square = least_square(left) + least_square(right)
If Least_square < Best_least_square
Best_least_square = Least_square
Assign best_children
Assign best_split_point

Function split_children
Left_index = all indexes which value is <split
Right_index = all indexes which value is >=split
Left_label = all labels which index is contained in Left_index
Right_Label = all labels which index is contained in right index
Return Left_index, Right_index, Left_label, Right_label

Function create tree(data, all_pos_split, max_depth, curr_depth, ideal_ls)
// checking two stopping conditions
All_features = all_pos_split
sum(all features) is 0 ,
Ending because there is no more features
End when curr_depth > max_depth
For j in all_pos_split
Get min_split which creates min_error
Children = split_children(min_split)
// make the recursive functions call on create_tree

Function make prediction(curr_node, data)
// base case here
If curr_node is leaf
Return its prediction
Get the attribute value from the data
If attribute_value < split_vaule
Choose left path
Choose right path

2. Using tree model to build Lambdamart model.
Lambdamart python file

Function dcg(scores)
Pass an array as parameter,
Iterate each entry of the array, get the value of 2 to the power of entry value, divide by the log likelihood of number of index.
Add all values together return it.

Function dcg_pred(scores)
Temp_a = np.power(2, scores[i]-1/ np.log(i+2) for the top 10 values and add them up

Function dcg_pred(scores, i, g, idcg)
// calculate the percentage we could improve by swapping 2 values
//idcg is the ideal dcg value here
Swap the values of index i and j in the array.
Old_Dcg =dcg(scores) without swapping
New_Dcg = dcg(scores) after applying swapping
Return (new_dcg- old_dcg)/idcg

Function single_dcg(scores, i, j)
For 2 different index i and j of scores array.
Return (np.power(2, scores[i]) - 1) / np.log2(j + 2)

Function lambda_parallel(args) which args include
Calculate the delta_ndcg according to the passed array and i, j index
Then calculate the lambda value based on the delta_ndcg

Function compute_lamda(args):
Core function
Based on the passed parameter true scores, predicted scores,
First using numpy argsort to retrieve the corresponding index of the sorted array of both true value and our predicted value.
For all the i,j pairs in good_ij_pairs
Calculate single_dcg for each pair and assign key with (i,j) and single_dcg(i,j) as the value, store the pair in the dictionary.
For all the i,j pairs in good_ij_pairs
Calculate lambda value based on the difference between i,j index and z_ndcg value of (i,i) and (i,j)

Function constructor of class LambdaMart
LambdaMart has its own training data, its own defined number of trees, leaves per tree and its learning rate.

Function predict
Iterate through each query indexes
Use the regression tree predict function to predict the training data of each entry
Add the predicted value together and assign it to the predicted scores which we return from the function

Function validate
Predict the testing data based on the tree build from training data

Function save and load
Save the Lambdamart model in the file by dumping it and load it every time

Main function
Read our training data from the file
Build the Lambdamart based on training data and learning rate by calling the constructor
Calculate the average dcg score by validating the testing data.

3. Implementation Challenge

0 comments on commit 4c5154a

Please sign in to comment.