-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regexgen bug in minimize() / output depends on order of Trie#addAll input #31
Comments
Turns out that |
Simpler test case with ASCII characters only: const assert = require('assert');
const Trie = require('regexgen').Trie;
const trie = new Trie();
const STRING_TO_MATCH = 'FBCD';
const strings = [
'AGBHD',
'EIBCD',
'EGBCD',
'FBJBF',
'AGBH',
'EIBC',
'EGBC',
'EBC',
'FBC',
'CD',
'F',
'C',
'ABCD',
'EBCD',
STRING_TO_MATCH,
];
// Uncommenting this results in regexgen generating a different pattern
// that passes the tests below (but still produces incorrect results in other cases):
//strings.sort();
trie.addAll(strings);
const pattern = trie.toString();
//console.log(pattern);
// → 'F(?:BJBF)?|(?:E[GI]?B|FB)?CD?|A(?:GBHD?|BCD)'
// Or with sort() first:
// → 'FBJBF|(?:E[GI]?B|FB)?CD|A(?:GBHD?|BCD)|(?:E[GI]?B|FB)?C|F'
const re = new RegExp(pattern, 'g');
assert(strings.includes(STRING_TO_MATCH));
// Verify that every string we told regexgen to match, is actually
// matched by the generated pattern.
for (const string of strings) {
const actual = string.match(re)[0];
assert(string === actual);
} |
This patch results in correct output: diff --git a/src/trie.js b/src/trie.js
index 8e363e1..f938633 100644
--- a/src/trie.js
+++ b/src/trie.js
@@ -42,7 +42,7 @@ class Trie {
* @return {State} - the starting state of the minimal DFA
*/
minimize() {
- return minimize(this.root);
+ return this.root;
}
/** So (unsurprisingly) the bug is somewhere in |
We can re-enable it once the upstream bugs are fixed: devongovett#31
We can re-enable it once the bugs are fixed: devongovett#31
Due to an upstream bug in the regular expression minimizing implementation, I’m temporarily disabling this functionaity. Unfortunately this results in significantly larger regular expression patterns, but at least the output is (presumably) correct now. Issue: #1 Issue: devongovett/regexgen#31
I patched regexgen to allow for easier inspection of its internal state. Here‘s the state for the above test case. Without sort (incorrect):
With sort (which in the case of this particular test case, gives 100% correct results, matching all the strings completely — so we can use it as a reference):
|
Issue: #1 Issue: devongovett/regexgen#31
See mathiasbynens/emoji-test-regex-pattern#1.
There seems to be an issue where regexgen produces incorrect output. The exact output depends on the order of the
Trie#addAll
input, which seems like a bug in and of itself.Test case:
Uncomment the
strings.sort()
line results in a different pattern (which appears to work correctly for this particular case, but it actually still doesn't match all expected strings). Here are the patterns regexgen generates for both cases:I think there's a bug in regexgen:
The text was updated successfully, but these errors were encountered: