Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - Handling of NaNs #86

Open
shaye059 opened this issue Mar 1, 2021 · 0 comments
Open

Feature Request - Handling of NaNs #86

shaye059 opened this issue Mar 1, 2021 · 0 comments

Comments

@shaye059
Copy link

shaye059 commented Mar 1, 2021

Currently, having any NaN values in the numpy array leads to the following error when trying to build a RCTree:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-8f8be9d6cf46> in <module>
----> 1 tree = rrcf.RCTree(data_anom.sample(1000, random_state=111).to_numpy())

~\anaconda3\envs\squarefeetenv\lib\site-packages\rrcf\rrcf.py in __init__(self, X, index_labels, precision, random_state)
    104             # Create RRC Tree
    105             S = np.ones(n, dtype=np.bool)
--> 106             self._mktree(X, S, N, I, parent=self)
    107             # Remove parent of root
    108             self.root.u = None

~\anaconda3\envs\squarefeetenv\lib\site-packages\rrcf\rrcf.py in _mktree(self, X, S, N, I, parent, side, depth)
    170         depth += 1
    171         # Create a cut according to definition 1
--> 172         S1, S2, branch = self._cut(X, S, parent=parent, side=side)
    173         # If S1 does not contain an isolated point...
    174         if S1.sum() > 1:

~\anaconda3\envs\squarefeetenv\lib\site-packages\rrcf\rrcf.py in _cut(self, X, S, parent, side)
    152         l /= l.sum()
    153         # Determine dimension to cut
--> 154         q = self.rng.choice(self.ndim, p=l)
    155         # Determine value for split
    156         p = self.rng.uniform(xmin[q], xmax[q])

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: probabilities contain NaN

Filling NaNs with mean or median column values is probably the best way to handle this so perhaps having it as a built-in option would be helpful. Maybe it could be an optional parameter during the creation of a RCTree with the default handling set to None?

@shaye059 shaye059 changed the title Feature Request - Handling of NaNs and DataFrame Support Feature Request - Handling of NaNs Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant