Releases: CatchTheTornado/text-extract-api
Releases · CatchTheTornado/text-extract-api
v0.2.0
What's Changed
- Marker removed due to licensing restrictions, enabling us to change the license to MIT; will be relocated to an external repository
- EasyOCR support
- License updated to MIT to align with our goals
At this point we support:
- PDF files
- Image files
- EasyOCR
- LLama 3.2-vision OCR
- All Ollama-supported models for second-stage text extraction
- S3 Storage, Google Drive Storage, Local file system storage
More features will be added soon - please watch us on Github!
Commits:
- feat: easyOCR added, tesseract - removed, marker - removed, license changed to MIT by @pkarw in #91
- Update README.md by @justinlevi in #90
New Contributors
- @justinlevi made their first contribution in #90
Full Changelog: v0.1.0...v0.2.0
v0.1.0
What's Changed
This is an initial release. At this point we are fully supporting:
- PDF files
- Image files
- Marker OCR
- LLama 3.2-vision OCR
- All Ollama-supported models for second-stage text extraction
- S3 Storage, Google Drive Storage, Local file system storage
More features will be added soon - please watch us on Github!
Full changelog:
- Fix typo in README.md by @martwozniak in #7
- Bugfix for #6 with CUDA - spawning the processes by @pkarw in #9
- Bugfix to #11, #12, #13 by @pkarw in #17
- [docs] how to run app locally without docker by @pkarw in #25
- Feat: #8 storage strategies - local file system + google drive by @pkarw in #10
- [feat] #30 - new
/ocr/request
endpoint proposals and docs by @pkarw in #31 - Update README.md to remove that extra "`" in cloning .env codeblock by @hahouari in #34
- Demo access by @pkarw in #35
- [feat] online demo link by @pkarw in #36
- Demo links + API client links by @pkarw in #38
- [feat] #15 Add S3 storage strategy by @choinek in #39
- WiP: [feat] llama3.2_vision update by @pkarw in #40
- [fix] cache returned by @pkarw in #42
- add missing poppler deps in dockerfile by @PasaOpasen in #45
- [fix] disable_ocr_cache fix by @pkarw in #47
- fix(#46) - fixed the way new ollama handles images by @pkarw in #48
- Project rename by @pkarw in #53
- Update docker-compose.gpu.yml by @tengerye in #69
- #59 #63 multiformat, reorganize and converters by @choinek in #76
New Contributors
- @martwozniak made their first contribution in #7
- @pkarw made their first contribution in #9
- @hahouari made their first contribution in #34
- @choinek made their first contribution in #39
- @PasaOpasen made their first contribution in #45
- @tengerye made their first contribution in #69
Full Changelog: https://github.com/CatchTheTornado/text-extract-api/commits/v0.1.0