GOT-OCR 2.0 Microservice 🚀

State-of-the-art OCR microservice powered by Transformers and FastAPI, supporting multiple OCR tasks and formats.

📌 Key Features

Multi-Task OCR Support
- Plain Text Extraction
- Formatted Text (LaTeX/Markdown)
- Region-Specific OCR (Box/Color)
- Multi-Page Document Processing
- Sheet Music & Math Formula Recognition
Input Formats: JPEG, PNG, TIFF
Output Formats: Raw Text, HTML, SVG
GPU Acceleration Support (CUDA)

🛠 Installation

Prerequisites

Python 3.9+
CUDA 11.7+ (For GPU acceleration)
Docker 20.10+ (Optional)

Without Docker

# Clone repository
git clone https://github.com/iammuhammadnoumankhan/FastAPI-GOT-OCR-2-Transformers.git
cd got-ocr-service

# Install dependencies
pip install -r requirements.txt

# Start service (CPU)
uvicorn main:app --host 0.0.0.0 --port 8000

# Start with GPU
CUDA_VISIBLE_DEVICES=0 uvicorn main:app --host 0.0.0.0 --port 8000

With Docker

# Build the Docker image:
docker build -t got-ocr-service .

# Run the container:
docker run -p 8000:8000 --gpus all got-ocr-service

🚀 Usage

Supported Tasks

Task	Parameters	Description
`Plain Text OCR`	None	Basic text extraction
`Format Text OCR`	`ocr_type=format`	Structured text output
`Fine-grained OCR (Box)`	`ocr_box=[x1,y1,x2,y2]`	Region-specific extraction
`Fine-grained OCR (Color)`	`ocr_color=red/green/blue`	Color-based extraction
`Multi-crop OCR`	None	Multiple region processing
`Multi-page OCR`	Multiple images	Document processing

API Endpoints

Endpoint	Method	Description
`/process`	POST	Main OCR processing
`/results/{result_id}`	GET	Retrieve HTML results
`/docs`	GET	Interactive API documentation

📋 Example Requests

1. Basic Text Extraction

curl -X POST "http://localhost:8000/process" \
  -F "task=Plain Text OCR" \
  -F "images=@document.jpg"

2. Math Formula Recognition

curl -X POST "http://localhost:8000/process" \
  -F "task=Format Text OCR" \
  -F "ocr_type=format" \
  -F "images=@equation.png"

3. Multi-Page Processing

curl -X POST "http://localhost:8000/process" \
  -F "task=Multi-page OCR" \
  -F "images=@page1.pdf" \
  -F "images=@page2.pdf"

4. Color-Based Extraction

curl -X POST "http://localhost:8000/process" \
  -F "task=Fine-grained OCR (Color)" \
  -F "ocr_color=red" \
  -F "images=@highlighted_text.png"

📚 API Documentation

Interactive Swagger documentation available at: http://localhost:8000/docs

Build & Run

# Build the Docker image:
docker build -t got-ocr-service .

# Run the container:
docker run -p 8000:8000 --gpus all got-ocr-service

📜 License

APACHE 2.0 License

Note: Replace localhost:8000 with your domain in production deployments. For large-scale usage, consider adding:

Redis caching
Load balancing
Rate limiting

More CURLs Commands:

Here are the curl commands for all the supported use cases and types in the GOT-OCR 2.0 FastAPI microservice:

1. Plain Text OCR

Extracts plain text from an image.

curl -X POST "http://localhost:8000/process" \
  -F "task=Plain Text OCR" \
  -F "images=@document.jpg"

2. Formatted Text OCR

Extracts formatted text (e.g., LaTeX, Markdown) from an image.

curl -X POST "http://localhost:8000/process" \
  -F "task=Format Text OCR" \
  -F "ocr_type=format" \
  -F "images=@formatted_document.png"

3. Fine-grained OCR (Box)

Extracts text from a specific bounding box region in the image.

curl -X POST "http://localhost:8000/process" \
  -F "task=Fine-grained OCR (Box)" \
  -F "ocr_box=[100,100,300,300]" \
  -F "images=@image_with_regions.jpg"

4. Fine-grained OCR (Color)

Extracts text from regions highlighted with a specific color.

curl -X POST "http://localhost:8000/process" \
  -F "task=Fine-grained OCR (Color)" \
  -F "ocr_color=red" \
  -F "images=@color_highlighted_image.png"

5. Multi-crop OCR

Processes multiple cropped regions of an image.

curl -X POST "http://localhost:8000/process" \
  -F "task=Multi-crop OCR" \
  -F "images=@multi_crop_image.jpg"

6. Multi-page OCR

Processes multiple pages of a document.

curl -X POST "http://localhost:8000/process" \
  -F "task=Multi-page OCR" \
  -F "images=@page1.png" \
  -F "images=@page2.png" \
  -F "images=@page3.png"

7. Sheet Music OCR

Processes sheet music and generates formatted output.

curl -X POST "http://localhost:8000/process" \
  -F "task=Format Text OCR" \
  -F "ocr_type=format" \
  -F "images=@sheet_music.png"

8. Math Formula OCR

Extracts mathematical formulas from an image.

curl -X POST "http://localhost:8000/process" \
  -F "task=Format Text OCR" \
  -F "ocr_type=format" \
  -F "images=@math_formula.png"

9. Table and Chart OCR

Extracts structured data from tables and charts.

curl -X POST "http://localhost:8000/process" \
  -F "task=Format Text OCR" \
  -F "ocr_type=format" \
  -F "images=@table_chart.png"

10. Batch Processing

Process multiple images in a single request.

curl -X POST "http://localhost:8000/process" \
  -F "task=Plain Text OCR" \
  -F "images=@image1.jpg" \
  -F "images=@image2.png" \
  -F "images=@image3.tiff"

11. Retrieve HTML Results

After processing, retrieve the HTML-rendered result using the result_id.

curl -X GET "http://localhost:8000/results/{result_id}"

Replace {result_id} with the ID returned in the response from the /process endpoint.

12. Error Handling Examples

Missing Image

curl -X POST "http://localhost:8000/process" \
  -F "task=Plain Text OCR"

Response:

{
  "detail": "No image provided"
}

Invalid Task

curl -X POST "http://localhost:8000/process" \
  -F "task=Invalid Task" \
  -F "images=@document.jpg"

Response:

{
  "detail": "Invalid task specified"
}

Invalid Color

curl -X POST "http://localhost:8000/process" \
  -F "task=Fine-grained OCR (Color)" \
  -F "ocr_color=purple" \
  -F "images=@image.jpg"

Response:

{
  "detail": "Invalid color specified"
}

Notes:

Replace http://localhost:8000 with your actual server URL if deployed elsewhere.
Ensure the image files (@document.jpg, @formatted_document.png, etc.) exist in the directory where you run the curl command.
For multi-file uploads, use multiple -F "images=@file" fields.
The ocr_box parameter should be in the format [x1,y1,x2,y2].
The ocr_color parameter supports only red, green, or blue.

These commands cover all the supported use cases and types for the GOT-OCR 2.0 microservice.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
example-img		example-img
render_tools		render_tools
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
globe.py		globe.py
main.py		main.py
render.py		render.py
requirements.txt		requirements.txt

License

iammuhammadnoumankhan/FastAPI-GOT-OCR-2-Transformers

Folders and files

Latest commit

History

Repository files navigation

GOT-OCR 2.0 Microservice 🚀

📌 Key Features

🛠 Installation

Prerequisites

Without Docker

With Docker

🚀 Usage

Supported Tasks

API Endpoints

📋 Example Requests

1. Basic Text Extraction

2. Math Formula Recognition

3. Multi-Page Processing

4. Color-Based Extraction

📚 API Documentation

Build & Run

📜 License

More CURLs Commands:

1. Plain Text OCR

2. Formatted Text OCR

3. Fine-grained OCR (Box)

4. Fine-grained OCR (Color)

5. Multi-crop OCR

6. Multi-page OCR

7. Sheet Music OCR

8. Math Formula OCR

9. Table and Chart OCR

10. Batch Processing

11. Retrieve HTML Results

12. Error Handling Examples

Missing Image

Invalid Task

Invalid Color

Notes:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages