Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour with Cyrillic regexes #20

Open
vikanezrimaya opened this issue Dec 1, 2017 · 0 comments
Open

Strange behaviour with Cyrillic regexes #20

vikanezrimaya opened this issue Dec 1, 2017 · 0 comments

Comments

@vikanezrimaya
Copy link

Hello! Used your library via Minta Electron app and noticed some strange lack of optimization when it comes to working with Cyrillic alphabet.

Regexes generated are cumbersome and bulky. For example (I converted unicode codepoints to Russian letters for convenience):

/ня(?:[кн]!|[кн])|Ня(?:[кн]?!|[кн])?/ (well, I know that detecting a substring nya is silly, but it is a perfect test case - short, memorable and permutable)

Which could be minimized to the following:
/[Нн]я(?:[кн][!\?]|[кн]|[!\?])?/

Seems like the generator cannot understand that я (cyrillic ya) in both Uppercase and lowercase strings is the same letter (and it is indeed, which is verified by looking at Unicode code points generated and outputted)
I will be glad to provide more data if you need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant