⚗️ Experimental Frappe OCR application with tesseract.
This project is a fork of ERPNext-OCR by John Vincent Fiel. Its aim is to fix and cleanup the original source code and add some new features.
Check out more on ERPNext Discuss.
See Taiga.io
Install tesseract-ocr, plus imagemagick and ghostscript (to work with pdf files) using this command on Debian:
sudo apt-get install tesseract-ocr imagemagick libmagickwand-dev ghostscript libtesseract-dev libleptonica-dev
source frappe-bench/env/bin/activate
pip3 install tesserocr
sudo apt-get install tesseract-ocr-kor
bench get-app erpnext_ocr https://github.com/yuntan0/erpnext_ocr
bench install-app erpnext_ocr
When installing Frappe app, the following python requirements will be installed:
python binding for tesseract, tesserocr
image processing library in python, pillow
HTTP library in python, requests
python binding for imagemagick, wand
File Being Read:
Sample Screenshot:
In order to use OCR with different languages, you need to install the appropriate trained data files. Check tesseract Wiki for details: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
If you wish to develop or just test locally this application, you can use docker-compose up -d
at the root of the this repository.
You can then access your ERPNext OCR dev env at http://localhost:8080
wand.exceptions.PolicyError: not authorized '/opt/sample.pdf' @ error/constitute.c/ReadImage/412
This can happen due to security configuration in imagemagick preventing it to read PDF files.
wand.exceptions.WandRuntimeError: MagickReadImage returns false, but did raise ImageMagick exception. This can occurs when a delegate is missing, or returns EXIT_SUCCESS without generating a raster.
This might happen if you're missing a dependency to convert PDF, most of the time
OSError: encoder error -2 when writing image file
- This might happen when trying to open a TIFF image, but the real error is "hidden" and only displayed in console.
- If the original error in console is
Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.
that usually happens when TIFF image compression is not valid / recognized.
bench run-tests --app erpnext_ocr
- Website: https://www.monogramm.io
- Github: @Monogramm
John Vincent Fiel
- Github: @jvfiel
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Check the contributing guide.
Give a ⭐ if this project helped you!
Copyright © 2019 Monogramm.
This project is MIT licensed.
This README was generated with ❤️ by readme-md-generator