diff --git a/Algorithm Pseudo Code.txt b/Algorithm Pseudo Code.txt new file mode 100644 index 0000000..f5b119e --- /dev/null +++ b/Algorithm Pseudo Code.txt @@ -0,0 +1,139 @@ +Algorithm Pseudo Code + + +The main part is in the regression tree python file and there are 2 separate parts in the regression tree part. +Part 1: Build a tree +Part 2: Predict the data based on the current tree model and validate it + +1 Regression Tree Model + + +Function get_slittting_points +Iterate through the attribute value of the parameter. +Sort the attribute value array in ascending order. +For index from 0 until array.length-1 + if(array[index) not equals array[index+1] + Add median value to the array +Return the array. + + +Function find_best_split_parallel + Best_least_square = infinity +For i in all possible splits +Children = split_children(data, i,) +Least_square = least_square(left) + least_square(right) +If Least_square < Best_least_square + Best_least_square = Least_square + Assign best_children + Assign best_split_point + + +Function split_children +Left_index = all indexes which value is =split +Left_label = all labels which index is contained in Left_index +Right_Label = all labels which index is contained in right index +Return Left_index, Right_index, Left_label, Right_label + + + + +Function create tree(data, all_pos_split, max_depth, curr_depth, ideal_ls) +// checking two stopping conditions +All_features = all_pos_split +Condition1 +sum(all features) is 0 , +Ending because there is no more features +Condition2 +End when curr_depth > max_depth +For j in all_pos_split + Get min_split which creates min_error +Children = split_children(min_split) +// make the recursive functions call on create_tree +create_tree(children.left_data…) +create_tree(children.right_data…) + + + + +Function make prediction(curr_node, data) + // base case here + If curr_node is leaf + Return its prediction +Get the attribute value from the data + If attribute_value < split_vaule + Choose left path + Else + Choose right path + + + + +2. Using tree model to build Lambdamart model. +Lambdamart python file + + +Function dcg(scores) +Pass an array as parameter, +Iterate each entry of the array, get the value of 2 to the power of entry value, divide by the log likelihood of number of index. +Add all values together return it. + + +Function dcg_pred(scores) +Temp_a = np.power(2, scores[i]-1/ np.log(i+2) for the top 10 values and add them up + + +Function dcg_pred(scores, i, g, idcg) +// calculate the percentage we could improve by swapping 2 values +//idcg is the ideal dcg value here +Swap the values of index i and j in the array. +Old_Dcg =dcg(scores) without swapping +New_Dcg = dcg(scores) after applying swapping +Return (new_dcg- old_dcg)/idcg + + +Function single_dcg(scores, i, j) + For 2 different index i and j of scores array. +Return (np.power(2, scores[i]) - 1) / np.log2(j + 2) + + +Function lambda_parallel(args) which args include +Calculate the delta_ndcg according to the passed array and i, j index +Then calculate the lambda value based on the delta_ndcg + + +Function compute_lamda(args): +Core function +Based on the passed parameter true scores, predicted scores, +First using numpy argsort to retrieve the corresponding index of the sorted array of both true value and our predicted value. +For all the i,j pairs in good_ij_pairs +Calculate single_dcg for each pair and assign key with (i,j) and single_dcg(i,j) as the value, store the pair in the dictionary. +For all the i,j pairs in good_ij_pairs + Calculate lambda value based on the difference between i,j index and z_ndcg value of (i,i) and (i,j) + + +Function constructor of class LambdaMart +LambdaMart has its own training data, its own defined number of trees, leaves per tree and its learning rate. + + +Function predict +Iterate through each query indexes +Use the regression tree predict function to predict the training data of each entry +Add the predicted value together and assign it to the predicted scores which we return from the function + + +Function validate +Predict the testing data based on the tree build from training data + + +Function save and load +Save the Lambdamart model in the file by dumping it and load it every time + + +Main function +Read our training data from the file +Build the Lambdamart based on training data and learning rate by calling the constructor +Calculate the average dcg score by validating the testing data. + + +3. Implementation Challenge