-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The problrm of RRCF training data to get the model #78
Comments
Yes. In this case you would:
You can also use a similar approach for classification: |
yep! I want to know more about the method of obtaining such a model. My current understanding is to use the to_dict function in the API interface. I wonder if this is correct? If so, can you please give me a specific code here? Thank you very much for your reply. |
This should work: Train model (same example as in README)import numpy as np
import pandas as pd
import rrcf
# Set parameters
np.random.seed(0)
n = 2010
d = 3
num_trees = 10
tree_size = 10
# Generate data
X = np.zeros((n, d))
X[:1000,0] = 5
X[1000:2000,0] = -5
X += 0.01*np.random.randn(*X.shape)
# Construct forest
forest = []
while len(forest) < num_trees:
# Select random subsets of points uniformly from point set
ixs = np.random.choice(n, size=(n // tree_size, tree_size),
replace=False)
# Add sampled trees to forest
trees = [rrcf.RCTree(X[ix], index_labels=ix) for ix in ixs]
forest.extend(trees) Save forest to json file# Write learned model to json file
import json
# Convert forest to list of dictionaries
out_json = [tree.to_dict() for tree in forest]
# Write forest to file
with open('forest.json', 'w') as outfile:
json.dump(out_json, outfile) Read forest from json file# Read json file into new forest
with open('forest.json', 'r') as infile:
forest_obj = json.load(infile)
new_forest = []
for tree_obj in forest_obj:
tree = rrcf.RCTree.from_dict(tree_obj)
new_forest.append(tree) Compare:>>> forest[0]
>>>
─+
├───+
│ ├──(6)
│ └───+
│ ├───+
│ │ ├──(1)
│ │ └──(4)
│ └──(8)
└───+
├───+
│ ├──(0)
│ └───+
│ ├───+
│ │ ├──(9)
│ │ └──(5)
│ └──(2)
└───+
├──(3)
└──(7) >>> new_forest[0]
>>>
─+
├───+
│ ├──(6)
│ └───+
│ ├───+
│ │ ├──(1)
│ │ └──(4)
│ └──(8)
└───+
├───+
│ ├──(0)
│ └───+
│ ├───+
│ │ ├──(9)
│ │ └──(5)
│ └──(2)
└───+
├──(3)
└──(7) |
Okay, I think I already understand how RRCF works like this! Thank you very much! :) |
If you want to use shingles, each point inserted into the tree should be of the form:
And so on. Each point will be of dimension (1 x nm) where n is the shingle size and m is the number of variables. |
This should be added to the doc example (didn't see it, either I miss it or not documented). |
Can RRCF obtain a model from the training set data, and then use this model to detect anomalies in the new data stream?
The text was updated successfully, but these errors were encountered: