State-of-the-art OCR microservice powered by Transformers and FastAPI, supporting multiple OCR tasks and formats.
- Multi-Task OCR Support
- Plain Text Extraction
- Formatted Text (LaTeX/Markdown)
- Region-Specific OCR (Box/Color)
- Multi-Page Document Processing
- Sheet Music & Math Formula Recognition
- Input Formats: JPEG, PNG, TIFF
- Output Formats: Raw Text, HTML, SVG
- GPU Acceleration Support (CUDA)
- Python 3.9+
- CUDA 11.7+ (For GPU acceleration)
- Docker 20.10+ (Optional)
# Clone repository
git clone https://github.com/iammuhammadnoumankhan/FastAPI-GOT-OCR-2-Transformers.git
cd got-ocr-service
# Install dependencies
pip install -r requirements.txt
# Start service (CPU)
uvicorn main:app --host 0.0.0.0 --port 8000
# Start with GPU
CUDA_VISIBLE_DEVICES=0 uvicorn main:app --host 0.0.0.0 --port 8000
# Build the Docker image:
docker build -t got-ocr-service .
# Run the container:
docker run -p 8000:8000 --gpus all got-ocr-service
Task | Parameters | Description |
---|---|---|
Plain Text OCR |
None | Basic text extraction |
Format Text OCR |
ocr_type=format |
Structured text output |
Fine-grained OCR (Box) |
ocr_box=[x1,y1,x2,y2] |
Region-specific extraction |
Fine-grained OCR (Color) |
ocr_color=red/green/blue |
Color-based extraction |
Multi-crop OCR |
None | Multiple region processing |
Multi-page OCR |
Multiple images | Document processing |
Endpoint | Method | Description |
---|---|---|
/process |
POST | Main OCR processing |
/results/{result_id} |
GET | Retrieve HTML results |
/docs |
GET | Interactive API documentation |
curl -X POST "http://localhost:8000/process" \
-F "task=Plain Text OCR" \
-F "images=@document.jpg"
curl -X POST "http://localhost:8000/process" \
-F "task=Format Text OCR" \
-F "ocr_type=format" \
-F "images=@equation.png"
curl -X POST "http://localhost:8000/process" \
-F "task=Multi-page OCR" \
-F "images=@page1.pdf" \
-F "images=@page2.pdf"
curl -X POST "http://localhost:8000/process" \
-F "task=Fine-grained OCR (Color)" \
-F "ocr_color=red" \
-F "images=@highlighted_text.png"
Interactive Swagger documentation available at:
http://localhost:8000/docs
# Build the Docker image:
docker build -t got-ocr-service .
# Run the container:
docker run -p 8000:8000 --gpus all got-ocr-service
APACHE 2.0 License
Note: Replace localhost:8000
with your domain in production deployments. For large-scale usage, consider adding:
- Redis caching
- Load balancing
- Rate limiting
Here are the curl
commands for all the supported use cases and types in the GOT-OCR 2.0 FastAPI microservice:
Extracts plain text from an image.
curl -X POST "http://localhost:8000/process" \
-F "task=Plain Text OCR" \
-F "images=@document.jpg"
Extracts formatted text (e.g., LaTeX, Markdown) from an image.
curl -X POST "http://localhost:8000/process" \
-F "task=Format Text OCR" \
-F "ocr_type=format" \
-F "images=@formatted_document.png"
Extracts text from a specific bounding box region in the image.
curl -X POST "http://localhost:8000/process" \
-F "task=Fine-grained OCR (Box)" \
-F "ocr_box=[100,100,300,300]" \
-F "images=@image_with_regions.jpg"
Extracts text from regions highlighted with a specific color.
curl -X POST "http://localhost:8000/process" \
-F "task=Fine-grained OCR (Color)" \
-F "ocr_color=red" \
-F "images=@color_highlighted_image.png"
Processes multiple cropped regions of an image.
curl -X POST "http://localhost:8000/process" \
-F "task=Multi-crop OCR" \
-F "images=@multi_crop_image.jpg"
Processes multiple pages of a document.
curl -X POST "http://localhost:8000/process" \
-F "task=Multi-page OCR" \
-F "images=@page1.png" \
-F "images=@page2.png" \
-F "images=@page3.png"
Processes sheet music and generates formatted output.
curl -X POST "http://localhost:8000/process" \
-F "task=Format Text OCR" \
-F "ocr_type=format" \
-F "images=@sheet_music.png"
Extracts mathematical formulas from an image.
curl -X POST "http://localhost:8000/process" \
-F "task=Format Text OCR" \
-F "ocr_type=format" \
-F "images=@math_formula.png"
Extracts structured data from tables and charts.
curl -X POST "http://localhost:8000/process" \
-F "task=Format Text OCR" \
-F "ocr_type=format" \
-F "images=@table_chart.png"
Process multiple images in a single request.
curl -X POST "http://localhost:8000/process" \
-F "task=Plain Text OCR" \
-F "images=@image1.jpg" \
-F "images=@image2.png" \
-F "images=@image3.tiff"
After processing, retrieve the HTML-rendered result using the result_id
.
curl -X GET "http://localhost:8000/results/{result_id}"
Replace {result_id}
with the ID returned in the response from the /process
endpoint.
curl -X POST "http://localhost:8000/process" \
-F "task=Plain Text OCR"
Response:
{
"detail": "No image provided"
}
curl -X POST "http://localhost:8000/process" \
-F "task=Invalid Task" \
-F "images=@document.jpg"
Response:
{
"detail": "Invalid task specified"
}
curl -X POST "http://localhost:8000/process" \
-F "task=Fine-grained OCR (Color)" \
-F "ocr_color=purple" \
-F "images=@image.jpg"
Response:
{
"detail": "Invalid color specified"
}
- Replace
http://localhost:8000
with your actual server URL if deployed elsewhere. - Ensure the image files (
@document.jpg
,@formatted_document.png
, etc.) exist in the directory where you run thecurl
command. - For multi-file uploads, use multiple
-F "images=@file"
fields. - The
ocr_box
parameter should be in the format[x1,y1,x2,y2]
. - The
ocr_color
parameter supports onlyred
,green
, orblue
.
These commands cover all the supported use cases and types for the GOT-OCR 2.0 microservice.