-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems to consider when making anchors optional #31
Comments
I think I have a commit (need to be adapted for grex) that could help to detect those kind of issues earlier: dubzzz/regexgen@b002997#diff-a561630bb56b82342bc66697aee2ad96efddcbc9d150665abd6fb7ecb7c0ab2fR140-R157 The test:
It might be a cool addition for the property based test suite 🤔 |
Hi @mathiasbynens and @dubzzz, thank you for making me aware of this "problem". If I decide to make the anchors optional in some later release, I will revisit this issue here again. |
@mathiasbynens While reviewing pull request #43, I've been dealing with this issue here again. I think this problem can be easily solved by sorting the test cases by length in descending order before doing any further processing. This way, test cases that happen to be prefixes of other test cases appear at the end of the generated regex and not at the start. In my opinion, it is not the DFA minimization algorithm that needs to be adjusted. Do you have any further thoughts? |
@pemistahl That's exactly how I've been working around the issue in my own code: sorting the list before passing it to grex. It would be amazing to not have to do that anymore :) |
The test cases are already sorted by grex but in the wrong order. Alright, then I will fix this in release 1.3.0. Thanks for your feedback. |
@mathiasbynens I have not been able so far to successfully formulate a sorting algorithm that always sorts the alternations in a regex without anchors in the correct order. For the time being, I've disabled DFA minimization for when both start and end anchors are disabled. The minimization algorithm is not the problem, however. It is somewhere in the business logic constructing the AST from the DFA. If I don't find a solution to this problem, I will release the optional anchors feature as it currently is. At least, the resulting regex is guaranteed to be correct. |
It seems grex inherited this bug from regexgen: devongovett/regexgen#31
Repro:
After removing
^
and$
(see #30), this generated pattern does not match"FBCD"
despite it being one of the input strings:Here’s what I think the bug is: within the generated pattern, it should never happen that something on the left matches a prefix of something that's further on the right, because then the latter can never match.
See devongovett/regexgen#31 (comment) for some more details.
The text was updated successfully, but these errors were encountered: