Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Parenthesis '(' or ')' in column names will fail #226

Open
1 task done
fraimondo opened this issue Apr 19, 2023 · 4 comments
Open
1 task done

[BUG]: Parenthesis '(' or ')' in column names will fail #226

fraimondo opened this issue Apr 19, 2023 · 4 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@fraimondo
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I just tried run_cross_validation with X_types={"continous": [".*"]. It failed due to some columns having "(" or ")" in the names.

Expected Behavior

No failure

Steps To Reproduce

Just run any example with "(" in the name.

Environment

Julearn dev in julearn_sk_pandas branch

Relevant log output

No response

Anything else?

No response

@fraimondo fraimondo added the bug Something isn't working label Apr 19, 2023
@fraimondo fraimondo added this to the v0.3.0 milestone Apr 19, 2023
@samihamdan
Copy link
Collaborator

samihamdan commented May 17, 2023

Wait is that valid? Like I know that we can set apply_to = '.*' to select all types, but we also allow to set one type to all column names by '.*'?

If so maybe add a failing test to the branch so we can fix it before merge.

@fraimondo
Copy link
Contributor Author

The '.*' was just an example. I'm also using it to define types like "ALFF" : "alff_.*".

The issue is the regular expression interpreting "(" wrongly.

@fraimondo
Copy link
Contributor Author

Ok, so @LeSasse also found this issue a bit annoying.

Mainly, the problem is that "(" and ")" are special characters in regular expressions. So there's no straightforward way of solving this issue "automatically": basically, if we replace "(" for "(", then we are somehow allowing a subset of regular expressions.

My take is that it is unlikely that someone will use a complex regexp with groups, but we should still allow that.

Proposal:

  1. Escape "(" and ")" in the regexp by default
  2. Add a config flag that disables this, just in case users want to actually use complex regexp.

@fraimondo
Copy link
Contributor Author

Based on the conversation with Leo, another approach would be to do a check and give the right error messages.

Indeed, thinking it with a clear head, one could simply escape the characters in the column names. However, if the user specifies these column names in X or X_types, then the regular expression issue appears again.

So one option is that we check the column names in the dataframe and warn the users if we detect any special character. In this warning, we can give the users the right julearn.config.set_config parameter, like enable_auto_escape_parenthesis, etc. etc.

So basically a one-liner solution without going into renaming columns.

@synchon synchon self-assigned this Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants