You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, having any NaN values in the numpy array leads to the following error when trying to build a RCTree:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-55-8f8be9d6cf46> in <module>
----> 1 tree = rrcf.RCTree(data_anom.sample(1000, random_state=111).to_numpy())
~\anaconda3\envs\squarefeetenv\lib\site-packages\rrcf\rrcf.py in __init__(self, X, index_labels, precision, random_state)
104 # Create RRC Tree
105 S = np.ones(n, dtype=np.bool)
--> 106 self._mktree(X, S, N, I, parent=self)
107 # Remove parent of root
108 self.root.u = None
~\anaconda3\envs\squarefeetenv\lib\site-packages\rrcf\rrcf.py in _mktree(self, X, S, N, I, parent, side, depth)
170 depth += 1
171 # Create a cut according to definition 1
--> 172 S1, S2, branch = self._cut(X, S, parent=parent, side=side)
173 # If S1 does not contain an isolated point...
174 if S1.sum() > 1:
~\anaconda3\envs\squarefeetenv\lib\site-packages\rrcf\rrcf.py in _cut(self, X, S, parent, side)
152 l /= l.sum()
153 # Determine dimension to cut
--> 154 q = self.rng.choice(self.ndim, p=l)
155 # Determine value for split
156 p = self.rng.uniform(xmin[q], xmax[q])
mtrand.pyx in numpy.random.mtrand.RandomState.choice()
ValueError: probabilities contain NaN
Filling NaNs with mean or median column values is probably the best way to handle this so perhaps having it as a built-in option would be helpful. Maybe it could be an optional parameter during the creation of a RCTree with the default handling set to None?
The text was updated successfully, but these errors were encountered:
shaye059
changed the title
Feature Request - Handling of NaNs and DataFrame Support
Feature Request - Handling of NaNs
Mar 1, 2021
Currently, having any NaN values in the numpy array leads to the following error when trying to build a RCTree:
Filling NaNs with mean or median column values is probably the best way to handle this so perhaps having it as a built-in option would be helpful. Maybe it could be an optional parameter during the creation of a RCTree with the default handling set to None?
The text was updated successfully, but these errors were encountered: