-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Added IDK² and s-IDK² Anomaly Detector To Aeon #2465
base: main
Are you sure you want to change the base?
Conversation
Thank you for contributing to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a very brief look at the code because I am short on time. I will have another look next year after the Christmas break. Meanwhile, the code quality can be improved and tests need to be added.
- The tests are missing! Please add reasonable tests for the new algorithm. How do you make sure that it produces the same results as the original implementation?
- The code mixes standard python (e.g.
random
,range
, ...) with numpy (e.g.np.random
,np.arange
, ...). Please mostly rely on numpy for better code quality and performance. - Why are most methods in capital letters and start with dunders (
__
)? If this is a reference to the original implementation, please add the original name as a comment and use our coding standard.
Thank you for taking the time and contributing to aeon!
Are there any other implementations or results we can use to compare output? Accuracy to the original algorithm(s) is important. I am not familiar with this area, but having incorrect implementations is generally damaging. Code could be fine, but not a good idea to merge until someone if confident that it is. |
The expected results array for the last test case was generated using the original anomaly detector, Which I used for comparison with this anomaly detector. Let me know if you'd like me to run any additional tests |
@SebastianSchmidl any suggestion on how to best evaluate this? Does not have to be as rigorous as a publication of course just enough to give us the idea the output is similar enough. |
How do you test the implementations in the other modules? I would compare this implementation against the original code: Probably, I would run the original code in a Docker image and capture the output for some test cases (including edge cases). Then, we can use the same input and output as fixtures for the tests in aeon, and use Unfortunately, I need to focus on other stuff until mid-summer. So, I cannot say if or when I would be able to do this. |
@SebastianSchmidl For classification we would just say run both on a decent number of UCR datasets and provide the accuracy results for all those datasets. No rush on you doing anything here, the onus is on the code contributor to prove that output is equivalent IMO unless they are trusted by the community or the algorithm author/similar. If you can get around to doing it eventually thats great, but there is no obligation to do anything at all really. |
sorry for the delay, been caught up with exams. I ran the anomaly detection using np.random.normal 100 times, each with a sample size of 100(used np.testing.assert_allclose ,mean and variance),I also added random_state to the original code since not using it can lead to a lot of variation in the results, here is the data and results: |
This does not really help us determine whether the implementation is accurate to the original algorithm. We would be looking for a comparison against the original implemented (or published results) using actual datasets and evaluation metrics. |
I used AUC to evaluate both the original model and AEON, and they produced identical results. This is expected since I set a random state for the original model, ensuring consistency. The scores are relatively low because I used only a width of 1, which otherwise reduces the output size of the original model. Here are the results(Also used np.testing.assert_close) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'm a little concerned about the licensing the the original code. Please ensure this is an adaptation, and there are not significant chunks of copy and pasted code if any are currently.
aeon/anomaly_detection/_idk.py
Outdated
|
||
def _predict(self, X): | ||
rng = np.random.default_rng(self.random_state) | ||
if self.sliding or self.width > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why self.width > 1
? This does not seem to match the docstring where it mentioned this used for both the sliding and fixed window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SebastianSchmidl mentioned that the output should be same size as input, if the width becomes greater than 1 it reduces the size of output so i thought i would add reverse window for that too., but here I did mistake by adding self.width wrong, as if width is greater, it would go for square sliding ,will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi that is correct, it should be the same length as the input.
That is a good base for evaluating but please clean up the code examples before you post, it is a little difficult to parse. Why have you limited it to a width of 1, that seems a bit odd for windowing? |
I made that change because, in the original model, the output size is reduced based on the window size. However, the Aeon model uses a reverse window to restore the original output size. I didn’t use the exact same reverse window since I felt it would change too much of the original implementation. But if you'd prefer that approach, I'm happy to reevaluate the test again. |
I'm having trouble applying a reverse window on fixed_window when self.width is greater than 1. Since this involves matrix decomposition, I'm wondering if @SebastianSchmidl could provide some insights |
I wouldn't change the tests given we want it to work with width>3, I don't think this will be merged until that is resolved somehow. I don't know enough about the algorithm and the windowing function to help with that unfortunately, does using stride and padding_length not work? |
Hi, this Aeon implementation works just as well as the original implementation, with one key difference: while the original model reduces the input size, Aeon attempts to restore it. This approach works well with sliding enabled, but without sliding, the output becomes too small for reverse windowing. Even if we apply reverse windowing, the result is just repeated numbers. So, I figured it’s best to return the output as is when sliding is set to false. |
closes #2114