Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Added IDK² and s-IDK² Anomaly Detector To Aeon #2465

Open
wants to merge 64 commits into
base: main
Choose a base branch
from

Conversation

Ramana-Raja
Copy link

closes #2114

@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I did not find any labels to add based on the title. Please add the [ENH], [MNT], [BUG], [DOC], [REF], [DEP] and/or [GOV] tags to your pull requests titles. For now you can add the labels manually.
I have added the following labels to this PR based on the changes made: [ $\color{#6F6E8D}{\textsf{anomaly detection}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run mypy typecheck tests
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)
  • Disable numba cache loading
  • Push an empty commit to re-run CI checks

@Ramana-Raja Ramana-Raja changed the title Added IDK² and s-IDK² Anomaly Detector To Aeon [ENH]Added IDK² and s-IDK² Anomaly Detector To Aeon Dec 19, 2024
@Ramana-Raja Ramana-Raja changed the title [ENH]Added IDK² and s-IDK² Anomaly Detector To Aeon [ENH] Added IDK² and s-IDK² Anomaly Detector To Aeon Dec 19, 2024
@MatthewMiddlehurst MatthewMiddlehurst added the enhancement New feature, improvement request or other non-bug code enhancement label Dec 19, 2024
Copy link
Member

@SebastianSchmidl SebastianSchmidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a very brief look at the code because I am short on time. I will have another look next year after the Christmas break. Meanwhile, the code quality can be improved and tests need to be added.

  • The tests are missing! Please add reasonable tests for the new algorithm. How do you make sure that it produces the same results as the original implementation?
  • The code mixes standard python (e.g. random, range, ...) with numpy (e.g. np.random, np.arange, ...). Please mostly rely on numpy for better code quality and performance.
  • Why are most methods in capital letters and start with dunders (__)? If this is a reference to the original implementation, please add the original name as a comment and use our coding standard.

Thank you for taking the time and contributing to aeon!

@MatthewMiddlehurst
Copy link
Member

Are there any other implementations or results we can use to compare output? Accuracy to the original algorithm(s) is important. I am not familiar with this area, but having incorrect implementations is generally damaging. Code could be fine, but not a good idea to merge until someone if confident that it is.

@Ramana-Raja
Copy link
Author

Ramana-Raja commented Feb 1, 2025

Are there any other implementations or results we can use to compare output? Accuracy to the original algorithm(s) is important. I am not familiar with this area, but having incorrect implementations is generally damaging. Code could be fine, but not a good idea to merge until someone if confident that it is.

The expected results array for the last test case was generated using the original anomaly detector, Which I used for comparison with this anomaly detector. Let me know if you'd like me to run any additional tests

@MatthewMiddlehurst
Copy link
Member

@SebastianSchmidl any suggestion on how to best evaluate this? Does not have to be as rigorous as a publication of course just enough to give us the idea the output is similar enough.

@SebastianSchmidl
Copy link
Member

@SebastianSchmidl any suggestion on how to best evaluate this? Does not have to be as rigorous as a publication of course just enough to give us the idea the output is similar enough.

How do you test the implementations in the other modules?

I would compare this implementation against the original code: Probably, I would run the original code in a Docker image and capture the output for some test cases (including edge cases). Then, we can use the same input and output as fixtures for the tests in aeon, and use np.testing.assert_close et al. to ignore rounding problems.

Unfortunately, I need to focus on other stuff until mid-summer. So, I cannot say if or when I would be able to do this.

@MatthewMiddlehurst
Copy link
Member

@SebastianSchmidl For classification we would just say run both on a decent number of UCR datasets and provide the accuracy results for all those datasets.

No rush on you doing anything here, the onus is on the code contributor to prove that output is equivalent IMO unless they are trusted by the community or the algorithm author/similar. If you can get around to doing it eventually thats great, but there is no obligation to do anything at all really.

@Ramana-Raja
Copy link
Author

Ramana-Raja commented Feb 4, 2025

@SebastianSchmidl For classification we would just say run both on a decent number of UCR datasets and provide the accuracy results for all those datasets.

No rush on you doing anything here, the onus is on the code contributor to prove that output is equivalent IMO unless they are trusted by the community or the algorithm author/similar. If you can get around to doing it eventually thats great, but there is no obligation to do anything at all really.

sorry for the delay, been caught up with exams. I ran the anomaly detection using np.random.normal 100 times, each with a sample size of 100(used np.testing.assert_allclose ,mean and variance),I also added random_state to the original code since not using it can lead to a lot of variation in the results, here is the data and results:

data/code:
image
results:
image

@MatthewMiddlehurst
Copy link
Member

This does not really help us determine whether the implementation is accurate to the original algorithm. We would be looking for a comparison against the original implemented (or published results) using actual datasets and evaluation metrics.

@Ramana-Raja
Copy link
Author

Ramana-Raja commented Feb 28, 2025

This does not really help us determine whether the implementation is accurate to the original algorithm. We would be looking for a comparison against the original implemented (or published results) using actual datasets and evaluation metrics.

I used AUC to evaluate both the original model and AEON, and they produced identical results. This is expected since I set a random state for the original model, ensuring consistency. The scores are relatively low because I used only a width of 1, which otherwise reduces the output size of the original model. Here are the results(Also used np.testing.assert_close) :

code(dataset used : kdd_tsad_135,ecg_diff_count_3):
image

output:
image

Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm a little concerned about the licensing the the original code. Please ensure this is an adaptation, and there are not significant chunks of copy and pasted code if any are currently.


def _predict(self, X):
rng = np.random.default_rng(self.random_state)
if self.sliding or self.width > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why self.width > 1? This does not seem to match the docstring where it mentioned this used for both the sliding and fixed window.

Copy link
Author

@Ramana-Raja Ramana-Raja Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SebastianSchmidl mentioned that the output should be same size as input, if the width becomes greater than 1 it reduces the size of output so i thought i would add reverse window for that too., but here I did mistake by adding self.width wrong, as if width is greater, it would go for square sliding ,will fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi that is correct, it should be the same length as the input.

@MatthewMiddlehurst
Copy link
Member

That is a good base for evaluating but please clean up the code examples before you post, it is a little difficult to parse. Why have you limited it to a width of 1, that seems a bit odd for windowing?

@Ramana-Raja
Copy link
Author

That is a good base for evaluating but please clean up the code examples before you post, it is a little difficult to parse. Why have you limited it to a width of 1, that seems a bit odd for windowing?

I made that change because, in the original model, the output size is reduced based on the window size. However, the Aeon model uses a reverse window to restore the original output size. I didn’t use the exact same reverse window since I felt it would change too much of the original implementation. But if you'd prefer that approach, I'm happy to reevaluate the test again.

@Ramana-Raja
Copy link
Author

I'm having trouble applying a reverse window on fixed_window when self.width is greater than 1. Since this involves matrix decomposition, I'm wondering if @SebastianSchmidl could provide some insights

@MatthewMiddlehurst
Copy link
Member

I wouldn't change the tests given we want it to work with width>3, I don't think this will be merged until that is resolved somehow.

I don't know enough about the algorithm and the windowing function to help with that unfortunately, does using stride and padding_length not work?

@Ramana-Raja
Copy link
Author

Ramana-Raja commented Mar 7, 2025

I wouldn't change the tests given we want it to work with width>3, I don't think this will be merged until that is resolved somehow.

I don't know enough about the algorithm and the windowing function to help with that unfortunately, does using stride and padding_length not work?

Hi, this Aeon implementation works just as well as the original implementation, with one key difference: while the original model reduces the input size, Aeon attempts to restore it. This approach works well with sliding enabled, but without sliding, the output becomes too small for reverse windowing. Even if we apply reverse windowing, the result is just repeated numbers. So, I figured it’s best to return the output as is when sliding is set to false.

(used width=2)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
anomaly detection Anomaly detection package enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Implement (s-)IDK² anomaly detector
3 participants