Releases · CatchTheTornado/text-extract-api

What's Changed

Marker removed due to licensing restrictions, enabling us to change the license to MIT; will be relocated to an external repository

EasyOCR support

License updated to MIT to align with our goals

At this point we support:

PDF files

Image files

EasyOCR

LLama 3.2-vision OCR

All Ollama-supported models for second-stage text extraction

S3 Storage, Google Drive Storage, Local file system storage

More features will be added soon - please watch us on Github!

Commits:

feat: easyOCR added, tesseract - removed, marker - removed, license changed to MIT by @pkarw in #91

Update README.md by @justinlevi in #90

What's Changed

This is an initial release. At this point we are fully supporting:

More features will be added soon - please watch us on Github!

Full changelog:

Fix typo in README.md by @martwozniak in #7
Bugfix for #6 with CUDA - spawning the processes by @pkarw in #9
Bugfix to #11, #12, #13 by @pkarw in #17
[docs] how to run app locally without docker by @pkarw in #25
Feat: #8 storage strategies - local file system + google drive by @pkarw in #10
[feat] #30 - new /ocr/request endpoint proposals and docs by @pkarw in #31
Update README.md to remove that extra "`" in cloning .env codeblock by @hahouari in #34
Demo access by @pkarw in #35
[feat] online demo link by @pkarw in #36
Demo links + API client links by @pkarw in #38
[feat] #15 Add S3 storage strategy by @choinek in #39
WiP: [feat] llama3.2_vision update by @pkarw in #40
[fix] cache returned by @pkarw in #42
add missing poppler deps in dockerfile by @PasaOpasen in #45
[fix] disable_ocr_cache fix by @pkarw in #47
fix(#46) - fixed the way new ollama handles images by @pkarw in #48
Project rename by @pkarw in #53
Update docker-compose.gpu.yml by @tengerye in #69
#59 #63 multiformat, reorganize and converters by @choinek in #76