-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
predict() method for models? #182
Comments
Pushed a version of this to main |
Works slightly differently to scikit-learn you pass views with optional missing views as None and it reconstructs all of the views from the learnt latent dimensions. |
Thanks! I'll check it out. |
This works well with my data, but only if the view data are whitened first. I'm not enough of an expert in these methods to say why this might be, but it looks like the methods for generating predictions are quite different in cca-zoo compared to sklearn's PLSRegression. |
If you come back to me in a week and a half I think I will be able to come up with a more detailed response and fix. Basically your observation is exactly what I would expect and a colleague of mine has been thinking about this in some depth recently. We learn weights W_x which transform XW_x=Z_x and W_y which transform YW_y=Z_y. Going from data to latent space is usually known as a backward problem. For prediction (or 'generation') we need a forward problem. For PLS it turns out the forward problem is X=ZW_x^T and Y=ZW_y^T But for CCA the forward problem is actually X=ZW_x^T\Sigma_X and Y=ZW_y^T\Sigma_Y. The predict function I wrote up quickly for you uses the PLS forward problem (because that's what scikit-learn appears to do). But notice that if Sigma_X is Identity then the forward problems are the same. Sigma_X is identity when your data is whitened and that's why you are seeing what you are seeing. Based on the above you might be able to implement a CCA prediction function without my help and if you do get a change feel free to send a PR :) otherwise I'll do it when I get a moment. |
I've been digging through the code and looking at weights, scores, loadings with To set the context, Y is 58000 by 40 and X is 58000 by 1500. sklearn's PLSRegression works reasonably well with about 10 components; For PLSRegression (i.e. PLS2), prediction works great for unwhitened data. The class For PLSCanonical, which I think is the same flavor of PLS as The reason I think there's an error in sklearn is that according to the User fm = LinearRegression()
fm.fit(model._x_scores, model._y_scores)
alpha = np.diag(np.diag(fm.coef_))
pred = X_test_scaled @ model.x_rotations_ @ alpha @ model.y_loadings_.T It seems to work, although I'm sure there's a better way to get α than multiple |
The scikit-learn implementations of PLS and CCA have predict() methods that are very useful for cross-validation and forecasting. Is it possible to add these to cca-zoo models where appropriate?
The text was updated successfully, but these errors were encountered: