Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Arsenal predict not working properly?! #1696

Open
tim-bsm opened this issue Jun 18, 2024 · 28 comments
Open

[BUG] Arsenal predict not working properly?! #1696

tim-bsm opened this issue Jun 18, 2024 · 28 comments
Labels
bug Something isn't working classification Classification package multithreading Multithreading issue

Comments

@tim-bsm
Copy link

tim-bsm commented Jun 18, 2024

Describe the issue

I was using sktime for Arsenal but because of the split - the reasons stated by Tony in this issue - and because of wanting to test, whether aeon is faster than sktime, I wanted to try aeon. As aeon is a fork of sktime, the basics are the same so it was kind of easy to switch to aeon. Well, that's what I thought. Turns out that the predict of the Arsenal algorithm now uses another type. I'm still trying to use the predict and can't get it to work.

For Arsenal, I'm fitting it in a different file and save its outcome with the joblib.dump function. The generated file is then being loaded back in the main script in which I'm trying to predict the class of my measurement. This always gives me the following error:

TypeError: No matching definition for argument type(s) array(float64, 1d, C), array(float64, 1d, C), Tuple(array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)), Tuple(array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)), int64

I'm trying to predict it like that:

acf = joblib.load(arsenal.joblib)
data = np.array(['0.0', '0.0', '0.0', ..., '0.0'], dtype=np.float64)
acf.predict(data)

The list contains 200 strings all being 0.0

After that I went to the examples directory in the aeon git-repo and tried to use the example written there. While reading this example I noticed that the line in "In [2]:" should be motions_test, motions_test_labels = load_basic_motions(split="test") instead of motions_test, motions_test_labels = load_basic_motions(split="train"), if I'm not completely misunderstanding the example. But this just as a side node.
So I tried to use the example with which I didn't get the above mentioned error anymore but instead a new one:

Terminating: fork() called from a process already using Gnu OpenMP, this is unsafe.

I assume that this is a direct cause of me using Pythons multiprocessing module to run the prediction parallel with another module?!

Is this a bug or am I doing something completely wrong? As I said, the example works great in sktime.

Suggest a potential alternative/fix

No response

Additional context

No response

@TonyBagnall TonyBagnall added the classification Classification package label Jun 18, 2024
@TonyBagnall
Copy link
Contributor

hi and welcome to aeon. This is related to the use of joblib, but not really sure whats going on. As a sanity check, the code below all works fine. Have you got a complete code example so we can give it a go? Most likely someone else here will have a much better idea. It could be numba related, typing often is

thanks for the typo spot, we would appreciate a PR to fix it.

from aeon.classification.convolution_based import Arsenal
import numpy as np

X = np.random.random((10,20))
X2 = np.random.random((10,20))
y = np.array([0,1,0,1,0,1,0,1,0,1])
X3 = np.random.random((10,1,20))
X4 = np.random.random((10,3,20))
afc = Arsenal()
afc.fit(X,y)
print(afc.predict(X2))
afc.fit(X3,y)
print(afc.predict(X2))
afc.fit(X4,y)
print(afc.predict(X4))

@tim-bsm
Copy link
Author

tim-bsm commented Jun 18, 2024

Hey @TonyBagnall thanks for the quick response. Here's a quick example. This only shows my problem with the types, as this should be the first one to solve.

from aeon.classification.convolution_based import Arsenal
import numpy as np
import joblib

def save_arsenal():
  print("saving")
  X = np.random.random((10, 20))
  y = np.array([0, 1, 2, 3, 3, 1, 0, 0, 2, 1])
  afc = Arsenal(n_estimators=15, n_jobs=-1, num_kernels=5000, random_state=42, rocket_transform='multirocket')
  afc.fit(X, y)
  joblib.dump(afc, "test.joblib")

def get_arsenal():
  print("getting")
  afc: Arsenal = joblib.load("test.joblib")

  # working
  X2 = np.random.random((10, 20))
  print(afc.predict(X2))

  # create new data
  tmp_meas = ['0.0']
  for _ in range(199):
    tmp_meas.append('0.0')
  print(tmp_meas)
  data = np.array([tmp_meas], dtype=np.float64) # this must be wrong?!
  print(data)

  # not working but does with sktime
  print(afc.predict(data))

save_arsenal()
get_arsenal()

@MatthewMiddlehurst
Copy link
Member

Running this now. will take a look and get back to you.

@MatthewMiddlehurst
Copy link
Member

So, it is crashing on print(afc.predict(data)) for me. But this makes sense since it is trained on a dataset with 20 series length then data has 200 series length. Thats not the error being raised however, so another issue to work through 🙂.

I think you have found an edge case where the transform does not accept a singular series, as this works:

  for _ in range(19):
    tmp_meas.append('0.0')
  data = np.array([tmp_meas, tmp_meas], dtype=np.float64)

I think if we revert the changes made to the class in #838 it should work @TonyBagnall?

@tim-bsm
Copy link
Author

tim-bsm commented Jun 18, 2024

To be fair, I just adopted the code of @TonyBagnall to create a code quickly that produces my error. But yeah, normally the dataset will be around 200 - although not always the complete same amount of data (+-20?). But since this difference of +-20 works with sktime, it should with aeon as well, as this is the same base, right?

So if I understood you correctly, this is a bug which can be fixed on your side, right? If so, can it be done fairly simple or does it take long until it will be released?

@MatthewMiddlehurst
Copy link
Member

MatthewMiddlehurst commented Jun 18, 2024

Yeah we can fix this on our side. Though may take a while to make a release, since we have a lot of other changes currently that we don't want to rush through.

The estimator does not accept unequal length data, so any series length change from the original should not be accepted. It's a bit strange it is getting to the transform where your code is crashing, and I will create an issue to throw and appropriate error when it sees unequal length where that is not allowed. It does not look like the version I wrote originally in the other package has changed to accept unequal length, so I assume its a bug there as well.

Regarding your second error, I can run the notebook without any issues on Windows. I don't know this is a conflict with something else, I would need an example really.

@MatthewMiddlehurst
Copy link
Member

MatthewMiddlehurst commented Jun 18, 2024

Though saying that, it runs the below without issues (I think). Maybe we can configure it to accept unequal length relatively easily. We really need to revamp this code 🙂.

  for _ in range(199):
    tmp_meas.append('0.0')
  data = np.array([tmp_meas, tmp_meas], dtype=np.float64)

Purely implementation though, I'm not sure if the rocket transformation will be outputting any useful features for comparing 20 series length vs 200

@tim-bsm
Copy link
Author

tim-bsm commented Jun 18, 2024

For the second part with multiprocessing, the following is crashing (i adopted the temporary fix of @MatthewMiddlehurst):

from aeon.classification.convolution_based import Arsenal
import numpy as np
import joblib
import multiprocessing as mp

def save_arsenal():
  print("saving")
  X = np.random.random((10, 20))
  y = np.array([0, 1, 2, 3, 3, 1, 0, 0, 2, 1])
  afc = Arsenal(n_estimators=15, n_jobs=-1, num_kernels=5000, random_state=42, rocket_transform='multirocket')
  afc.fit(X, y)
  joblib.dump(afc, "test.joblib")

def get_arsenal():
  print("getting")
  afc: Arsenal = joblib.load("test.joblib")

  # working
  X2 = np.random.random((10, 20))
  print(afc.predict(X2))

  # create new data
  tmp_meas = ['0.0']
  for _ in range(199):
    tmp_meas.append('0.0')
  print(tmp_meas)
  data = np.array([tmp_meas, tmp_meas], dtype=np.float64) # this must be wrong?!
  print(data)

  # not working but does with sktime
  print(afc.predict(data))

if __name__ == '__main__':
  save_arsenal()
  
  # this works (even with range(199))
  # get_arsenal()

  # this does not
  mp.Process(target=get_arsenal, args=()).start()

@tim-bsm
Copy link
Author

tim-bsm commented Jun 18, 2024

And another part of the second problem was a typo i think. @TonyBagnall asked me to open a PR but I'm fairly new to GitHub and haven't really opened any real PR so far. So I'm a bit lost on this one to be honest.

Regarding your last comment about the 199 @MatthewMiddlehurst, it does work (can be run with my latest code snippet). I agree that the output is much likely going to be unusable. But as i said, my real values are somewhere around 200 and normally just changing +-4, sometimes +- ~20. So, as far as I noticed, Arsenal ist stable enough and the outcome is good.

@MatthewMiddlehurst
Copy link
Member

The above code snippet actually runs without an exception for me. Not very familiar with in-built Python multithreading though, TBH. I'm on Python 3.10, Windows 10 OS.

Don't worry about the typo.

@tim-bsm
Copy link
Author

tim-bsm commented Jun 18, 2024

Mhm, strange. For me it gives the error Terminating: fork() called from a process already using Gnu OpenMP, this is unsafe. on Ubuntu with Python 3.12. Can anybody test this on Linux?

@TonyBagnall
Copy link
Contributor

hi, no worries about making the contribution, but if you want to figure out how to do it we are more than happy to help. There is a guide here
https://www.aeon-toolkit.org/en/stable/contributing.html
starts with this
https://www.aeon-toolkit.org/en/stable/developer_guide/dev_installation.html
feel free to join our slack, there are lots of people there happy to help. We have all been in the position of getting started, it can be confusing at first but you will get used to it soon. I'll look at the rest later

@TonyBagnall TonyBagnall added the bug Something isn't working label Jun 19, 2024
@TonyBagnall TonyBagnall changed the title Arsenal predict not working properly?! [BUG] Arsenal predict not working properly?! Jun 19, 2024
@SebastianSchmidl
Copy link
Member

SebastianSchmidl commented Jun 19, 2024

I can reproduce the error Terminating: fork() called from a process already using Gnu OpenMP, this is unsafe on Linux, Python 3.12, aeon 0.9.0

python -c "from aeon import show_versions; show_versions()"

System:
python: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0]
executable: /home/sebastian/.conda/envs/aeon-tmp/bin/python
machine: Linux-5.15.0-112-generic-x86_64-with-glibc2.35

Python dependencies:
pip: 24.0
setuptools: 69.5.1
scikit-learn: 1.4.2
aeon: 0.9.0
statsmodels: None
numpy: 1.26.4
scipy: 1.12.0
pandas: 2.0.3
matplotlib: None
joblib: 1.4.2
numba: 0.59.1
pmdarima: None
tsfresh: None


OpenMP being incompatible with fork()-based multiprocessing is a known issue. For more details also see https://stackoverflow.com/questions/49049388/understanding-openmp-shortcomings-regarding-fork

I am not familiar with the implementation of Arsenal, but if it is using OpenMP, then the behavior when using fork()-multiprocessing is expected.

You can work around this issue by using a different multiprocessing start method, such as spawn (default on Windows) or forkserver:

if __name__ == '__main__':
  save_arsenal()
  
  # this works (even with range(199))
  # get_arsenal()

  # this does not
-  mp.Process(target=get_arsenal, args=()).start()
+  ctx = mp.get_context("forkserver")
+  ctx.Process(target=get_arsenal, args=()).start()

@tim-bsm
Copy link
Author

tim-bsm commented Jun 19, 2024

Alright, thank you @CodeLionX for letting me know about this. I'll try that tomorrow when I'm back at work and will let you guys know, if it's working.

Just out of curiosity to the others, if they might know - as @CodeLionX said, he does not know about the implementation of Arsenal:
Is Arsenal implemented with OpenMP and if so, is there any advantage or reason for that?

Btw, thank you guys for your effort and your quick replies so far :)

@MatthewMiddlehurst
Copy link
Member

MatthewMiddlehurst commented Jun 19, 2024

The implementation uses joblib.Parallel for multithreading (i.e. https://github.com/aeon-toolkit/aeon/blob/main/aeon/classification/convolution_based/_arsenal.py#L309), there may also be some threading via numba in the transform but not 100% on that.

The same for most of our classifiers really.

@TonyBagnall
Copy link
Contributor

from aeon.classification.convolution_based import Arsenal
import numpy as np
import joblib

def save_arsenal():
  print("saving")
  X = np.random.random((10, 20))
  y = np.array([0, 1, 2, 3, 3, 1, 0, 0, 2, 1])
  afc = Arsenal(n_estimators=15, n_jobs=-1, num_kernels=5000, random_state=42, rocket_transform='multirocket')
  afc.fit(X, y)
  joblib.dump(afc, "test.joblib")

def get_arsenal():
  print("getting")
  afc: Arsenal = joblib.load("test.joblib")

  # working
  X2 = np.random.random((10, 20))
  print(afc.predict(X2))

  # create new data

So, it is crashing on print(afc.predict(data)) for me. But this makes sense since it is trained on a dataset with 20 series length then data has 200 series length. Thats not the error being raised however, so another issue to work through 🙂.

I think you have found an edge case where the transform does not accept a singular series, as this works:

  for _ in range(19):
    tmp_meas.append('0.0')
  data = np.array([tmp_meas, tmp_meas], dtype=np.float64)

I think if we revert the changes made to the class in #838 it should work @TonyBagnall?

could you create a separate issue for this @MatthewMiddlehurst? Its not the core of the problem here and accepting different length input in 2D numpy from train to test is not really supported, and Im not really sure it should be, since it would need the capability to handle unequal length

So, it is crashing on print(afc.predict(data)) for me. But this makes sense since it is trained on a dataset with 20 series length then data has 200 series length. Thats not the error being raised however, so another issue to work through 🙂.

I think you have found an edge case where the transform does not accept a singular series, as this works:

  for _ in range(19):
    tmp_meas.append('0.0')
  data = np.array([tmp_meas, tmp_meas], dtype=np.float64)

I think if we revert the changes made to the class in #838 it should work @TonyBagnall?

I think this is a separate issue to the core of the problem tim-bsm is having, perhaps raise a separate issue @MatthewMiddlehurst ?

@TonyBagnall
Copy link
Contributor

Alright, thank you @CodeLionX for letting me know about this. I'll try that tomorrow when I'm back at work and will let you guys know, if it's working.

Just out of curiosity to the others, if they might know - as @CodeLionX said, he does not know about the implementation of Arsenal: Is Arsenal implemented with OpenMP and if so, is there any advantage or reason for that?

Btw, thank you guys for your effort and your quick replies so far :)

the problem is, I imagine, in the multirocket transformer and numba interacting with things strangely

@TonyBagnall
Copy link
Contributor

So just to clarify, ignoring the 1 instance predict for now, this is ok on PC, but does not work on linux? Could this be an issue with all threaded classifiers? I can try tomorrow

from aeon.classification.convolution_based import Arsenal
import numpy as np
import joblib

def save_arsenal():
  print("saving")
  X = np.random.random((10, 20))
  y = np.array([0, 1, 2, 3, 3, 1, 0, 0, 2, 1])
  afc = Arsenal(n_estimators=15, n_jobs=-1, num_kernels=5000, random_state=42, rocket_transform='multirocket')
  afc.fit(X, y)
  joblib.dump(afc, "test.joblib")

def get_arsenal():
  print("getting")
  afc: Arsenal = joblib.load("test.joblib")

  data = np.random.random((5, 20))
  print(data)
  d = afc.predict(data)
  print(d)

save_arsenal()
get_arsenal()

@SebastianSchmidl
Copy link
Member

So just to clarify, ignoring the 1 instance predict for now, this is ok on PC, but does not work on linux? Could this be an issue with all threaded classifiers? I can try tomorrow

from aeon.classification.convolution_based import Arsenal
import numpy as np
import joblib

def save_arsenal():
  print("saving")
  X = np.random.random((10, 20))
  y = np.array([0, 1, 2, 3, 3, 1, 0, 0, 2, 1])
  afc = Arsenal(n_estimators=15, n_jobs=-1, num_kernels=5000, random_state=42, rocket_transform='multirocket')
  afc.fit(X, y)
  joblib.dump(afc, "test.joblib")

def get_arsenal():
  print("getting")
  afc: Arsenal = joblib.load("test.joblib")

  data = np.random.random((5, 20))
  print(data)
  d = afc.predict(data)
  print(d)

save_arsenal()
get_arsenal()

Works on my machine (Linux) 🤷🏼

``` saving getting [[4.12579970e-01 9.43962492e-01 4.50538839e-01 5.10200269e-01 6.17581249e-01 4.29840622e-01 4.72601500e-01 7.36845702e-01 2.54153572e-01 6.56264568e-01 5.05757277e-01 7.03395391e-01 8.79386466e-01 1.53694733e-01 1.83096689e-01 2.38425862e-01 1.06455586e-01 8.81482196e-01 2.54574305e-01 6.34024989e-01] [1.43721642e-01 5.78407747e-01 2.48709225e-01 4.06285339e-01 1.99512839e-01 8.34101393e-01 4.45591048e-01 5.40979505e-01 2.80944406e-01 9.85203389e-01 3.27930333e-01 1.10382293e-01 3.04353997e-01 9.31727161e-01 8.90602119e-01 8.04148287e-01 4.05208332e-01 5.61208805e-01 6.85359832e-01 4.61386323e-01] [9.32059296e-01 9.95112689e-01 9.42499646e-01 9.18160027e-01 9.17201937e-01 2.20693115e-01 8.54205515e-01 2.47529375e-01 3.72458403e-01 7.06726359e-01 9.61085329e-01 5.53309263e-05 2.17180248e-01 8.28294063e-01 8.66669904e-01 3.39003289e-01 9.30323336e-01 1.78459067e-01 2.66497503e-01 5.48359740e-01] [5.22741447e-01 7.39107382e-01 6.61867014e-01 2.28299613e-01 4.81490920e-01 7.15845610e-02 4.48867231e-01 1.45556605e-02 9.60642561e-01 5.85185808e-01 9.40618925e-01 7.77600943e-01 8.69418403e-01 8.75651112e-01 3.92136107e-01 4.20679491e-01 5.47434726e-01 2.76389936e-01 4.95750020e-01 2.63786814e-01] [5.06840887e-01 1.81527691e-01 6.77100693e-01 8.94554339e-03 1.99894960e-01 9.16988947e-02 7.09162424e-01 9.83784239e-01 2.78857075e-01 9.31721404e-01 4.45364379e-01 4.33336009e-01 6.59150849e-01 7.22308270e-01 4.38304524e-01 3.03662190e-01 5.69611432e-01 5.92996430e-01 6.02911177e-01 6.60528646e-01]] [1 3 2 1 1] ```

@tim-bsm
Copy link
Author

tim-bsm commented Jun 20, 2024

So just to clarify, ignoring the 1 instance predict for now, this is ok on PC, but does not work on linux? Could this be an issue with all threaded classifiers? I can try tomorrow

from aeon.classification.convolution_based import Arsenal
import numpy as np
import joblib

def save_arsenal():
  print("saving")
  X = np.random.random((10, 20))
  y = np.array([0, 1, 2, 3, 3, 1, 0, 0, 2, 1])
  afc = Arsenal(n_estimators=15, n_jobs=-1, num_kernels=5000, random_state=42, rocket_transform='multirocket')
  afc.fit(X, y)
  joblib.dump(afc, "test.joblib")

def get_arsenal():
  print("getting")
  afc: Arsenal = joblib.load("test.joblib")

  data = np.random.random((5, 20))
  print(data)
  d = afc.predict(data)
  print(d)

save_arsenal()
get_arsenal()

As this example is almost the same as the following (my code I posted some messages ago), this works fine for me.

def get_arsenal():
  print("getting")
  afc: Arsenal = joblib.load("test.joblib")

  # working -> almost same as this
  X2 = np.random.random((10, 20))
  print(afc.predict(X2))

The second part of @MatthewMiddlehurst answer worked for me and did the trick. So as he said, it seems like only one entry in a 2D numpy array is somehow not accepted. So for me, I'll just work with this as a work-around until it might be fixed on your side.

So, it is crashing on print(afc.predict(data)) for me. But this makes sense since it is trained on a dataset with 20 series length then data has 200 series length. Thats not the error being raised however, so another issue to work through 🙂.

I think you have found an edge case where the transform does not accept a singular series, as this works:

  for _ in range(19):
    tmp_meas.append('0.0')
  data = np.array([tmp_meas, tmp_meas], dtype=np.float64)

I think if we revert the changes made to the class in #838 it should work @TonyBagnall?

@tim-bsm
Copy link
Author

tim-bsm commented Jun 20, 2024

You can work around this issue by using a different multiprocessing start method, such as spawn (default on Windows) or forkserver:

if __name__ == '__main__':
  save_arsenal()
  
  # this works (even with range(199))
  # get_arsenal()

  # this does not
-  mp.Process(target=get_arsenal, args=()).start()
+  ctx = mp.get_context("forkserver")
+  ctx.Process(target=get_arsenal, args=()).start()

So I just tried it. For this case, I agree, it worked. Thank you @CodeLionX. But in my case, I need the process to be stored in a list, as i have multiple processes. For that, the given method does not work. Simple code below:

from aeon.classification.convolution_based import Arsenal
import numpy as np
import joblib
import multiprocessing as mp

def save_arsenal():
  print("saving")
  X = np.random.random((10, 20))
  y = np.array([0, 1, 2, 3, 3, 1, 0, 0, 2, 1])
  afc = Arsenal(n_estimators=15, n_jobs=-1, num_kernels=5000, random_state=42, rocket_transform='multirocket')
  afc.fit(X, y)
  joblib.dump(afc, "test.joblib")

def get_arsenal():
  print("getting")
  afc: Arsenal = joblib.load("test.joblib")

  # working
  X2 = np.random.random((10, 20))
  print(afc.predict(X2))

  # create new data
  tmp_meas = ['0.0']
  for _ in range(199):
    tmp_meas.append('0.0')
  print(tmp_meas)
  data = np.array([tmp_meas, tmp_meas], dtype=np.float64) # this must be wrong?!
  print(data)

  # not working but does with sktime
  print(afc.predict(data))

if __name__ == '__main__':
  save_arsenal()
  
  # this works (even with range(199))
  # get_arsenal()
  
  ctx = mp.get_context("forkserver")  

  # this does as well
  # ctx.Process(target=get_arsenal, args=()).start()

  # this does not
  p = ctx.Process(target=get_arsenal, args=())
  p.start()

Setting the start method like so

if __name__ == '__main__':
  mp.set_start_method("forkserver")

  save_arsenal()
  ...

as stated on some StackOverflow posts, gives me the error RuntimeError: context has already been set

@SebastianSchmidl
Copy link
Member

Are you sure?
There should be no difference in

ctx.Process(target=get_arsenal, args=()).start()

and

p = ctx.Process(target=get_arsenal, args=())
p.start()

Both work for me with the "forkserver" and "spawn" start methods.

@tim-bsm
Copy link
Author

tim-bsm commented Jun 20, 2024

Are you sure? There should be no difference in

ctx.Process(target=get_arsenal, args=()).start()

and

p = ctx.Process(target=get_arsenal, args=())
p.start()

Both work for me with the "forkserver" and "spawn" start methods.

Yeah, I thought the same. Thats why I was a bit confused.

It turns out that it only occurs on my real project and not the little example I'm posting here. Thought I tried it in there as well, before asking you guys. My bad, sorry
I tracked it down and I think it has something to do with me using shared queues between the processes. At least now - as i switched the queue from mp.Queue() to cxt.Queue() - I'm getting a new error ^^:
AttributeError: Can't pickle local object 'CDLL.__init__.<locals>._FuncPtr'

@tim-bsm
Copy link
Author

tim-bsm commented Jun 20, 2024

Can't get the queue to work. Has anyone experience with this one and might be able to help me?

from aeon.classification.convolution_based import Arsenal
import numpy as np
import joblib
import multiprocessing as mp

def save_arsenal():
  print("saving")
  X = np.random.random((10, 20))
  y = np.array([0, 1, 2, 3, 3, 1, 0, 0, 2, 1])
  afc = Arsenal(n_estimators=15, n_jobs=-1, num_kernels=5000, random_state=42, rocket_transform='multirocket')
  afc.fit(X, y)
  joblib.dump(afc, "test.joblib")

def get_arsenal(q: mp.Queue):
  q.put("test")
  print("getting")
  afc: Arsenal = joblib.load("test.joblib")

  # working
  X2 = np.random.random((10, 20))
  print(afc.predict(X2))

  # create new data
  tmp_meas = ['0.0']
  for _ in range(199):
    tmp_meas.append('0.0')
  print(tmp_meas)
  data = np.array([tmp_meas, tmp_meas], dtype=np.float64) # this must be wrong?!
  print(data)

  # not working but does with sktime
  print(afc.predict(data))

if __name__ == '__main__':
  save_arsenal()
  
  # this works (even with range(199))
  # get_arsenal()
  
  ctx = mp.get_context("forkserver")
  q = mp.Queue() # -> RuntimeError: A SemLock created in a fork context is being shared with a process in a span context. This is not supported. Please use the same context to create multiprocessing objects and Process.
  q = ctx.Queue() # -> FileNotFoundError: [Errno 2] No such file or directory
  p = ctx.Process(target=get_arsenal, args=(q, ))
  p.start()

@TonyBagnall
Copy link
Contributor

hi, so we have I think fixed the related issued you have helpfully identified @tim-bsm in #1712, #1713 and #1769

Could you clarify the aeon related bug? If its not directly aeon related, I would like to change the tag from bug, since I am trying to tidy up bugs directly related to aeon. If it is aeon related, could you specify the problem? Is it just Arsenal or any estimator? thanks

@tim-bsm
Copy link
Author

tim-bsm commented Jul 8, 2024

@TonyBagnall I'm not really sure whether it is a bug or not. That's what I was trying to find out with this issue. I just found out that the implementation of arsenal and its use are behaving differently from what I was expecting by using sktime's arsenal beforehand. In my opinion, the one problem where I NEED to have at least two entries in a list (following example) shouldn't be like that.

Though saying that, it runs the below without issues (I think). Maybe we can configure it to accept unequal length relatively easily. We really need to revamp this code 🙂.

  for _ in range(199):
    tmp_meas.append('0.0')
  data = np.array([tmp_meas, tmp_meas], dtype=np.float64)

Moreover, the problems with multiprocessing and the shared queue relate to using Gnu OpenMP to realise the multiprocessing with the n_jobs in Arsenal. If this is intended , as it gives some special advantages I'm not aware of, then this is just a different implementation from how sktime works. Then this obviously isn't a bug but also does complicate the process of working with aeon - at least for me it did.

I can reproduce the error Terminating: fork() called from a process already using Gnu OpenMP, this is unsafe on Linux, Python 3.12, aeon 0.9.0

OpenMP being incompatible with fork()-based multiprocessing is a known issue. For more details also see https://stackoverflow.com/questions/49049388/understanding-openmp-shortcomings-regarding-fork

I am not familiar with the implementation of Arsenal, but if it is using OpenMP, then the behavior when using fork()-multiprocessing is expected.

You can work around this issue by using a different multiprocessing start method, such as spawn (default on Windows) or forkserver:

if __name__ == '__main__':
  save_arsenal()
  
  # this works (even with range(199))
  # get_arsenal()

  # this does not
-  mp.Process(target=get_arsenal, args=()).start()
+  ctx = mp.get_context("forkserver")
+  ctx.Process(target=get_arsenal, args=()).start()

Now for the last one - whether different lengths of time series should be accepted or not: I'm not nearly a professional or anything like that to tell you, if this is a bug or not. I get what @MatthewMiddlehurst said in the below regarding the trained data not being the same length than the one trying to classify. It makes sense that if I train the algorithm with a dataset of 200 values for example and use it with 20 values, the result will be incorrect or at least really unreliable. But for the use-case I have, it might be possible - as already stated multiple times in this issue- that the values aren't always the same exact length than what was used to train the algorithm.
So regarding this issue, I don't really mind if this should be fixed or not (as the results are correct and usable as well). ^^

The estimator does not accept unequal length data, so any series length change from the original should not be accepted. It's a bit strange it is getting to the transform where your code is crashing, and I will create an issue to throw and appropriate error when it sees unequal length where that is not allowed. It does not look like the version I wrote originally in the other package has changed to accept unequal length, so I assume its a bug there as well.

@TonyBagnall
Copy link
Contributor

thanks for the answer, I'll leave this as a bug as I have no idea about the OpenMP stuff, nor why the implementation differs from the legacy one in sktime, which matthew also wrote. I'll try get my head around this at some point.

#1758 fixed the issue of train data/test data being unequal length. I am working on making them have capability to handle unequal length here #1746 but Ive not yet decided exactly how to do it. Also fixed the typo you spotted, all valuable input

@tim-bsm
Copy link
Author

tim-bsm commented Jul 9, 2024

Perfect, thank you guys for helping with my issues. Also I'm glad if I can help. Well at least spot the bugs to report them, as I'm not that into the implementation as you guys :)

@MatthewMiddlehurst MatthewMiddlehurst added the multithreading Multithreading issue label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working classification Classification package multithreading Multithreading issue
Projects
None yet
Development

No branches or pull requests

4 participants