Description: This project leverages OpenCV and Tesseract OCR to detect and scan documents from images. It locates document contours, corrects perspective for a clean, top-down view, and applies OCR (using Persian language support) to recognize and extract text. The script also detects and adjusts image orientation based on OCR confidence, ensuring proper readability.
- Python 3.6+
- OpenCV for image processing (cv2 module)
- Pytesseract for Optical Character Recognition (OCR)
numpy
: For numerical operationscv2
: OpenCV library for image processingpytesseract
: Tesseract OCR wrapper
- Install Python libraries:
pip install numpy opencv-python pytesseract
- Install Tesseract OCR:
- Windows: Download and install Tesseract from https://github.com/UB-Mannheim/tesseract/wiki.
- MacOS/Linux: Install Tesseract via package managers, e.g.,
brew install tesseract
(Mac) orsudo apt install tesseract-ocr
(Linux).
- Prepare Input Image: Save the document image as input_image.jpg in the same directory (You can use PNG file).
- Run the Python script
app.py
using the command:
python app.py
- Output:
result_image.jpg
: Scanned document image after perspective correction.rotated_image.jpg
: Final image with corrected orientation.
- Edge Detection and Contour Detection:
-
- Converts the image to grayscale, applies Gaussian blur, and performs Canny edge detection.
-
- Identifies document boundaries and applies morphological operations to close gaps.
- Perspective Correction:
-
- Locates a contour with four corners (assumed to be the document) and applies perspective transformation to get a top-down view.
- OCR and Orientation Detection:
-
- Runs OCR on the center portion of the image at various angles (0°, 90°, 180°, and 270°).
-
- Chooses the angle with the highest OCR confidence for final orientation.
- This script is optimized for Persian language OCR; modify the lang parameter in ocr_image if using other languages.
- Ensure
pytesseract.pytesseract.tesseract_cmd
points to your Tesseract installation if running on Windows.