Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion between alphabets #45

Open
AlainGourves opened this issue Nov 20, 2024 · 4 comments
Open

Confusion between alphabets #45

AlainGourves opened this issue Nov 20, 2024 · 4 comments

Comments

@AlainGourves
Copy link

First of all, thank you for this app, it’s really very practical!

I am currently using it to capture excerpts from a scan of an old (1722) French book, so the print quality is not ideal.
The small problem I have is that some letters of the Roman alphabet are transcribed into letters of the Cyrillic alphabet of the same shape ("B" or "C" for example).
It’s easy to correct afterwards but I think it might be interesting to point it out to you...

For comparison, if I use the OCR functions of Preview, the result is far from good in terms of preserving the layout, on the other hand there is no problem of confusion of alphabets.

Here is an image of a text where I encounter the problem. A large part of the "B" and "C" (but not all) are transcribed in Cyrillic alphabet.
Image

@melonamin
Copy link
Contributor

It may happen in Automatic mode, but if you specifically select French, it should work as expected.

@AlainGourves
Copy link
Author

I get the same result regardless of the automatic/language mode, maybe it's because these are only sequences of letters rather than "real" words?
By testing real sentences and with proper language setting, the problem seems to disappear despite the poor print quality.

@melonamin
Copy link
Contributor

Interesting, I'm going to play with other parameters to see if I can fix it.

@AlainGourves
Copy link
Author

Here is the link to the book I captured if you want to make some tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants