Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Wrong initialization of bounded cost matrix for some elastic distances #1810

Closed
albertoazzari opened this issue Jul 16, 2024 · 8 comments
Labels
bug Something isn't working distances Distances package

Comments

@albertoazzari
Copy link

Describe the bug

When using the banded version in some of the elastic distances functions with parameter window < 1.0, the current initialization of the cost matrix is incorrect. It is now filled with all zeros except the first row and column, which are filled with +inf (with the exception of the cell 0,0). This should be instead filled with +inf.
This leads to an incorrect output of the distances because when computing a cell of the matrix, the algorithm takes as a possibility, i.e. cost_matrix[.,.], a value of 0 in a neighboring cell outside the band instead of +inf.

The list of distances affected by this is the following:

  1. ERP
  2. MSM
  3. TWE

Notes:
LCSS should not be modified due to the fact that the correct initial value is 0.0.

Steps/Code to reproduce the bug

import numpy as np
from aeon import distances as aeon

X = np.array(([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0], [10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0]))

aeon_D = aeon.erp_distance(X[0], X[1], g=0.0, window=0.1)

Expected results

Due to this window constraint, it can only use the value on the diagonal, choosing to produce a mismatch for each entry.

Aeon result is 8.0, while the expected result should be 44.0.
Basically:

sum(abs(x-y) for x,y in zip(X[0], X[1]))

Actual results

assert aeon_D == sum(abs(x-y) for x,y in zip(X[0], X[1]))

Versions

System:
python: 3.12.4 (main, Jun 8 2024, 18:29:57) [GCC 11.4.0]
executable: /.venv/bin/python
machine: Linux-6.5.0-41-generic-x86_64-with-glibc2.35

Python dependencies:
pip: 24.1.2
setuptools: None
scikit-learn: 1.4.2
aeon: 0.10.0
statsmodels: None
numpy: 1.26.4
scipy: 1.12.0
pandas: 2.0.3
matplotlib: None
joblib: 1.4.2
numba: 0.59.1
pmdarima: None
tsfresh: None

@albertoazzari albertoazzari added the bug Something isn't working label Jul 16, 2024
@TonyBagnall
Copy link
Contributor

thanks for that @chrisholder and I will take a look

@TonyBagnall TonyBagnall added the distances Distances package label Jul 16, 2024
@chrisholder
Copy link
Contributor

Great find Ill be sure to give this a look today! Thanks @albertoazzari

@albertoazzari
Copy link
Author

If you're okay with it, I can do a pull request to address the bug.

@MatthewMiddlehurst
Copy link
Member

@chrisholder do you already have something for this? If not, we would be glad if you could make a pull request for it @albertoazzari

@chrisholder
Copy link
Contributor

chrisholder commented Jul 29, 2024

I looked it and this is an oversight on my part:

As you say the reason is due to the cost matrices being initialised to 0s instead of infinity like in DTW. What that means with how the current bounding works is it is pathing to the edge of the bounds and encountering all zeros instead of infinity like in DTW. The I also checked the distances and added EDR to the list:

  • ERP
  • EDR
  • MSM
  • TWE

Here is a visual example of the issue for msm
image

As can be seen because the CM is initialised to 0s the min warping path will path to the edge of the window and follow it down.

@chrisholder
Copy link
Contributor

As a comparison this is a DTW path with the same window:

image

@chrisholder
Copy link
Contributor

PR above fixes this issue thanks for finding it!

I ran the example you gave above and the answer is now 44.0

>>> import numpy as np
>>> from aeon import distances as aeon
>>> X = np.array(([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0], [10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0]))
>>> aeon_D = aeon.erp_distance(X[0], X[1], g=0.0, window=0.1)
44.0

@albertoazzari
Copy link
Author

Thanks for all the effort and for fixing the bug :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distances Distances package
Projects
None yet
Development

No branches or pull requests

4 participants