-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hindi Language support #1617
Comments
Let me know what is required w.r.t. datasets to make this happen, BTW this exists - https://github.com/iitb-research-code/indic-doctr . And there is this for corpus - https://huggingface.co/datasets/ai4bharat/sangraha. |
Vocabs was added in #1687 |
I haven't spent enough time on this yet. So, my comprehension of the requirements might be a bit lacking. But are you saying adding the vocab is enough, as others can now use the code base with their own models for Hindi? If that is the case, maybe we can add the vocab for the rest of the scripts as well in a separate issue - I see the other indic vocabs here - https://github.com/iitb-research-code/indic-doctr/blob/main/doctr/datasets/vocabs.py |
If you have a model which was trained with doctr and on exactly the added vocabs (same char order and length) then yes |
In general you should already be able to use one of the provided models here: For example:
|
I will check that on some sample documents. Thanks for the clarifications. |
Part of #1699 |
🚀 The feature
#Hindi Language Support for Indians
As for Indians, Hindi is also must be considered in Doctr-Vocabs.
Motivation, pitch
As in India, Mostly documents are in Hindi language which is not currently supported by Doctr. The only thing which we Indians need so it will easy to create POC's and make solutions using Doctr as a first step in OCR related stuffs.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: