Strange behaviour with Cyrillic regexes #20

vikanezrimaya · 2017-12-01T11:25:40Z

Hello! Used your library via Minta Electron app and noticed some strange lack of optimization when it comes to working with Cyrillic alphabet.

Regexes generated are cumbersome and bulky. For example (I converted unicode codepoints to Russian letters for convenience):

/ня(?:[кн]!|[кн])|Ня(?:[кн]?!|[кн])?/ (well, I know that detecting a substring nya is silly, but it is a perfect test case - short, memorable and permutable)

Which could be minimized to the following:
/[Нн]я(?:[кн][!\?]|[кн]|[!\?])?/

Seems like the generator cannot understand that я (cyrillic ya) in both Uppercase and lowercase strings is the same letter (and it is indeed, which is verified by looking at Unicode code points generated and outputted)
I will be glad to provide more data if you need it.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behaviour with Cyrillic regexes #20

Strange behaviour with Cyrillic regexes #20

vikanezrimaya commented Dec 1, 2017

Strange behaviour with Cyrillic regexes #20

Strange behaviour with Cyrillic regexes #20

Comments

vikanezrimaya commented Dec 1, 2017