[BUG]: Parenthesis '(' or ')' in column names will fail #226

fraimondo · 2023-04-19T13:55:34Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

I just tried run_cross_validation with X_types={"continous": [".*"]. It failed due to some columns having "(" or ")" in the names.

Expected Behavior

No failure

Steps To Reproduce

Just run any example with "(" in the name.

Environment

Julearn dev in julearn_sk_pandas branch

Relevant log output

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

samihamdan · 2023-05-17T12:38:49Z

Wait is that valid? Like I know that we can set apply_to = '.*' to select all types, but we also allow to set one type to all column names by '.*'?

If so maybe add a failing test to the branch so we can fix it before merge.

fraimondo · 2023-05-17T14:05:52Z

The '.*' was just an example. I'm also using it to define types like "ALFF" : "alff_.*".

The issue is the regular expression interpreting "(" wrongly.

fraimondo · 2024-01-16T11:58:04Z

Ok, so @LeSasse also found this issue a bit annoying.

Mainly, the problem is that "(" and ")" are special characters in regular expressions. So there's no straightforward way of solving this issue "automatically": basically, if we replace "(" for "(", then we are somehow allowing a subset of regular expressions.

My take is that it is unlikely that someone will use a complex regexp with groups, but we should still allow that.

Proposal:

Escape "(" and ")" in the regexp by default
Add a config flag that disables this, just in case users want to actually use complex regexp.

fraimondo · 2024-01-17T08:25:34Z

Based on the conversation with Leo, another approach would be to do a check and give the right error messages.

Indeed, thinking it with a clear head, one could simply escape the characters in the column names. However, if the user specifies these column names in X or X_types, then the regular expression issue appears again.

So one option is that we check the column names in the dataframe and warn the users if we detect any special character. In this warning, we can give the users the right julearn.config.set_config parameter, like enable_auto_escape_parenthesis, etc. etc.

So basically a one-liner solution without going into renaming columns.

fraimondo added the bug Something isn't working label Apr 19, 2023

fraimondo added this to the v0.3.0 milestone Apr 19, 2023

synchon self-assigned this Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Parenthesis '(' or ')' in column names will fail #226

[BUG]: Parenthesis '(' or ')' in column names will fail #226

fraimondo commented Apr 19, 2023

samihamdan commented May 17, 2023 •

edited

Loading

fraimondo commented May 17, 2023

fraimondo commented Jan 16, 2024

fraimondo commented Jan 17, 2024

[BUG]: Parenthesis '(' or ')' in column names will fail #226

[BUG]: Parenthesis '(' or ')' in column names will fail #226

Comments

fraimondo commented Apr 19, 2023

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Relevant log output

Anything else?

samihamdan commented May 17, 2023 • edited Loading

fraimondo commented May 17, 2023

fraimondo commented Jan 16, 2024

fraimondo commented Jan 17, 2024

samihamdan commented May 17, 2023 •

edited

Loading