-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RCTree cannot handle when the data consists of only one unique value #88
Comments
I have a walkaround for this scenario. If the dataset contains only one unique value, the tree can be build by creating an empty tree and insert 'new' data point into it iteratively. |
The idea of the walkaround is if you reach a single value, instead of creating a tree with all the observations, you insert a point to a empty tree. An example code snippet would be like:
|
Hey buddy,
I have posted the code for the walkaround. Feel free to check it out.
Cheers
W
… 在 2021年5月5日,10:31,titaii2 ***@***.***> 写道:
I have a walkaround for this scenario. If the dataset contains only one unique value, the tree can be build by creating an empty tree and insert 'new' data point into it iteratively.
Hello :)
Can you post your solution code here ?
Because I have the same problem when my data is populated by X1 at 98% and X2 at 2% which makes the sample (created with "ixs") only populated by X1 which cause the same error "probabilities contain NaN" while doing "l /= l.sum()"
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I ran into issues when a subset of my sample data points only contain ONE unique value. How should we handle such an exception?
The error message basically suggests a NaN value for probability (caused by division by zero). I tried to turn this into a uniform distribution, but it caused subsequent issue after a cut the right side contains no values. I think this violates the principle of the RRCF algo. Do we have better way of resolving such cases?
The text was updated successfully, but these errors were encountered: