-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Problems with the size of data #1606
Comments
I suspect that this might be because of the strict definition of the signature of the transform function in multirocket, which only accepts float64 arrays. In the fit method, X is converted to float64, but not in transform. What is the datatype of your input ? If it's float32, would converting it to float64 work (the size of the data might become an issue if you don't have enought RAM tho ..., but it's for testing purposes, you can reduce it) ? If this is the cause of the bug, we would need to discuss why float64 has been made mandatory in the function signature, and if we can relax it to allow other types. |
thanks for the bug report. From the trace this comes from fit called on arsenal. This works from aeon.classification.convolution_based import Arsenal
import numpy as np
shape = (40, 2, 2000)
X = np.random.rand(*shape).astype(np.float32)
y = np.random.randint(0, 2, size=40)
afc = Arsenal()
afc.fit(X, y) what is the data type for your xtrain? |
I would also recommend put a time limit on HC2 if you want to run it on a problem that size |
ah ignore that, as @baraline pointed out on slack, I missed that you had set it to multirocket. This does indeed crash, definitely a bug. from aeon.classification.convolution_based import Arsenal
from aeon.transformations.collection.convolution_based import MultiRocket
from aeon.classification.hybrid import HIVECOTEV2
import numpy as np
shape = (40, 2, 200)
X = np.random.rand(*shape).astype(np.float32)
print(X.shape)
y = np.random.randint(0, 2, size=40)
afc = Arsenal(rocket_transform="multirocket")
afc.fit(X, y)
print("Finished fit for arsenal")
print(afc.predict(X)) wait, its more complex. This crashes with multivariate series TypeError: No matching definition for argument type(s) array(float32, 3d, C), array(float32, 3d, C) but not with univariate |
for some bizarre reason we have MultiRocketMultivariate and MultiRocket, so problem lies with the former (dont ask why we have these weird versions, its legacy!) mr = MultiRocketMultivariate()
mr.fit(X)
Xt = mr.transform(X) gives the same type error. The problem occurs in the numba internal method _transform (confusingly not the one implementing the abstract class). it has this numba signature @njit(
"float32[:,:](float64[:,:,:],float64[:,:,:],"
"Tuple((int32[:],int32[:],int32[:],int32[:],float32[:])),"
"Tuple((int32[:],int32[:],int32[:],int32[:],float32[:])),int32)",
fastmath=True,
parallel=True,
cache=True,
)
def _transform(X, X1, parameters, parameters1, n_features_per_kernel=4):
num_examples, num_channels, input_length = X.shape the univariate version has this @njit(
"float32[:,:](float64[:,:],float64[:,:],Tuple((int32[:],int32[:],float32[:])),"
"Tuple((int32[:],int32[:],float32[:])),int32)",
fastmath=True,
parallel=True,
cache=True,
)
def _transform(X, X1, parameters, parameters1, n_features_per_kernel): |
Hi, The issue disappeared with the trick of converting the data to float64 but after sometime the code stopped with a memory issue. The data input shape was (400000, 2, 2048), probably to much to handle it with RAM memory and worst if the numbers are 64 bits. Is there no way of using batches in the training to avoid this ? Thanks ! Rbt |
Hey, I think the right way of handling this on our side would be to make those function support both float64 and float32 inputs, we'll discuss the best approach and work on a fix. In the meantime, I see two options if you want to use your full dataset, which unfortunately will include some tinkering :
Second option would of course be for only one rocket transformer, to mimic arsenal behavior, you would need to do this |
personally I would just train it on a subset, ultimately Rocket classifiers are pipelines which generate very large feature spaces. Flip side is you probably really dont need that much data to train, it is after all mostly random. Predict can oc be done in batches |
in terms of the code, I think for now just do what rocket does, cast to 32 bits |
@rruizdeaustri should be fixed by #1612 at least in terms of 32 bit/64 bit. We plan to redesign the whole rocket package, but it will always be memory intensive, (see #1126). I dont think you can avoid creating a n_cases, n_kernels array if you use it as the authors proposed. I suggest either reducing the train size or the number of kernels ok to close this issue? |
Yes and thanks a lot for the quick feedback! |
Describe the bug
Hi,
I want to use rocket algorithms to classify Gravitational waves. The size of my data is (400000, 2, 2048) where 2 is the number of channels and 2048 is the length of each time series. It does not work.
Thank you !
Roberto
Steps/Code to reproduce the bug
Expected results
Just the classifier works
Actual results
Versions
0.8.1
The text was updated successfully, but these errors were encountered: