Hi,

Your issue has been fixed. Try with a snapshot or wait for 2.0.9.
https://issues.apache.org/jira/browse/PDFBOX-4060

Tilman


Am 09.01.2018 um 19:59 schrieb Tilman Hausherr:
Am 09.01.2018 um 09:05 schrieb Tilman Hausherr:
Am 09.01.2018 um 07:10 schrieb Soon Keong Tan:
My team is having some problems with the image rendering speed of certain PDF file. For most of the pdf files we are handling, it only took seconds
to create an image of the file but for certain pdf, it took more than 6
minutes.

We have tried the following version of pdfbox-app-x.x.x.jar, and it seems
that 1.8.x is more efficient at rendering the image.
  (1)1.8.13  - 1.5 mins
  (2)2.0.5 - 6.18 mins
  (3)2.0.8 -  6.35 mins
However, due to the problem that we had with some files where some Japanese characters were not correctly rendered using 1.8.13, we had to use 2.0.5 as
the production version.

I tried inserting some debug code in the PDFToImage class (ver2.0.5) to
determine the bottle-necked process, and it seems
"renderer.renderImageWithDPI" was taking up most of the time.

==========================
Java version: 1.7.0_72
PDFBox version: 2.0.5
Command line: java -jar ./pdfbox-app-2.0.5.jar PDFToImage -time -startPage
1 -endPage 1 ./sample_slow.pdf
File: https://goo.gl/WEMM2X
==========================
The full version of the PDF is quite large, so the linked file above is the
cropped version (the page which we are having problem rendering). The
cropped version is created using PDFSplit command line.

This is my first time using the mailing list, should I just create a JIRA
ticket requesting help instead of addressing the mailing list regarding
this problem?


It's fine to post to the mailing list first.

I had a quick look on your file... it has 1999 probably identical separation colorspaces that are just a black or white value. These map to a CMYK colorspace.

I'll look more later this week.

Did you set / try the two settings mentioned here?
https://pdfbox.apache.org/2.0/getting-started.html


I ran the profiler and the cause of the slowness is different... there is a large jpeg file (5349 x 3806) that uses a DeviceN colorspace which in turn is based on CMYK. There's some slowness due to the type0 convert function from DeviceN to CMYK, but this is less than 10%. Most time is from converting the CMYK to RGB, one pixel at a time. (Because of the DeviceN colorspace we can't use bulk conversion, which may or may not be faster)

I've opened a JIRA issue (https://issues.apache.org/jira/browse/PDFBOX-4060) but don't expect this to be fixed soon. CMYK and ICC colorspaces are our weak spot :-(

Tilman




Tilman



Any help is deeply appreciated. Thank you in anticipation.

Regards,
Soon Keong Tan
----------------------
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to