Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Color & image issue #40

Open
mmettey opened this issue Jan 6, 2025 · 4 comments
Open

Color & image issue #40

mmettey opened this issue Jan 6, 2025 · 4 comments

Comments

@mmettey
Copy link

mmettey commented Jan 6, 2025

Hi,

I am encoutering an issue when converting a specific PDF file.
I tried to generate this PDF with two different libraries. They look almost identical, but I have two different issues when converting them to SVG.

1st case - color alteration

When I convert this PDF :
page58-gs.pdf
I get this result :
page58-gs-1.svg

We can see that the left blue border is becoming red in SVG.

2nd case - removed images

When I convert the second PDF :
page58-mu.pdf
I get this result :
page58-mu-1.svg

Here, the color is right, but some images on the top right corner of the page disappeared.

I had these issues using the last CLI version of pdftosvg.net (1.5.0).

Thanks in advance for your help !

@dmester
Copy link
Owner

dmester commented Jan 6, 2025

Hello,

Thanks for reporting this issue.

1st case

The blue border is using ICC colors. PdfToSvg.NET does not currently support ICC profiles and falls back to assuming they are RGB colors, which does not produce a good result here. The result is the same if opening the PDF in Chrome and Firefox.

Is it possible to change the color to RGB or CMYK as a workaround?

2nd case

The images are JPEG2000 encoded, which is currently not supported.

I was under the impression this format was practically dead. Did the library itself encode it to JPEG2000, or did you use a source JPEG2000 image?

@manfromarce
Copy link

It also happens to me that PDFs can contain JPEG 2000 or JBIG2 images if you have no control over the input file. Would it be possible to integrate libraries such as https://github.com/cinderblocks/CoreJ2K and https://github.com/nicholsab/JBig2Decoder.NETStandard to decode these formats?

@dmester
Copy link
Owner

dmester commented Jan 9, 2025

Support for JBIG2 is included in the not yet released v1.6.0. I have put JPEG 2000 on the todo list for a future version.

@mmettey
Copy link
Author

mmettey commented Jan 10, 2025

Thank you very much for clarifying the issue.

In the first case, the PDF was extracted from a full source PDF using Ghostscript. Since Ghostscript rewrites a new PDF instead of just splitting the pages, it appears to alter some elements, specifically the color types.

I confirm that in the second case, the PDF file was not altered by the library; it was simply split from the full PDF source using mutool (MuPDF).
I tried converting the same page using the full source PDF and the --pages parameter, but the result was the same. This suggests the source file was created using the JPEG2000 format for these images.

Given these findings, I'll explore options for adjusting the source file before conversion.
Maybe I can suggest to implement some kind of debug mode (with a new parameter --debug), where the CLI would return warnings when elements like JPEG2000 are found and not converted ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants