Hi Andreas,

sorry I was busy uploading the PDFs and writing the mail, didn't see your mail, 
but I figured PDFToImage might be the correct choice here ;). 

I do not get any exceptions but some info logs, which are:

Apr 8, 2013 10:16:49 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BX
Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BMC
Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: DP
Apr 8, 2013 10:16:51 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Apr 8, 2013 10:16:52 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EX


Those I get for every page in this document. 

Cheers,

Alex

--
Dr. Alexander G. Klenner
Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
Schloss Birlinghoven, D-53754 Sankt Augustin
Tel.: +49 - 2241 - 14 - 2736
E-mail: [email protected]
Internet: http://www.scai.fraunhofer.de


----- Original Message -----
From: "Andreas Lehmkühler" <[email protected]>
To: [email protected]
Sent: Monday, April 8, 2013 9:58:25 AM
Subject: Re: errors with PDPage.convertToImage()

Hi,

Maruan Sahyoun <[email protected]> hat am 8. April 2013 um 09:20
geschrieben:
> Hi,
>
> unfortunately the attachment didn't make it through.
Due to some security restrictions.

> Could you try the PDF in question using the command line app ExtractImage with
> the -nonSeq  parameter or use the following code
I guess there is a missunderstanding. Please use PDFToImage to create one image
for
each page [1]. Provide us with any possible exception or log.

> PDDocument pdDoc = PDDocument.loadNonSeq(…)
>
> The NonSequentialParser gives better results if the document has incremental
> updates.
> In addition it's not necessary to create a new PDDocument from the cosDoc as
> parser.getDocument already passes a PDDocument ….
+1, that's an old pattern and should be used any more.

> BR from you neighborhood
I'm not that far away either ;-)

> Maruan Sahyoun
>
> Am 08.04.2013 um 08:52 schrieb Alexander Klenner
> <[email protected]>:
>
> > Hi all,
> >
> > I frequently come across PDFs where the convertToImage() method is
> > generating blank or partly blank images. One of those PDFs is attached to
> > this mail.
> >
> > My code for processing:
> >
> > PDFParser parser;
> > parser = new PDFParser(new FileInputStream(f));
> > parser.parse();
> > cosDoc = parser.getDocument();
> >
> > pdDoc = new PDDocument(cosDoc);
> > ..
> > Iterator<PDPage> it = pdDoc.getDocumentCatalog().getAllPages().iterator();
> > PDPage page = it.next();
> > ...
> > PDRectangle cropBox = page.findCropBox();
> > Dimension dimension = cropBox.createDimension();
> > ...
> > BufferedImage img = page.convertToImage(BufferedImage.TYPE_INT_RGB,
> > ImageParser.PARAM_DPI);
> >
> >
> > I am using pdfbox-app-1.8.0.jar.
> >
> > So I have two questions:
> >
> > 1. Is there a different way to to extract the page as an image that I am not
> > aware of to get the correct image?
> > 2. Or is it possible to detect, that this page was not extracted correctly
> > before or after the extraction?
> >
> > At the moment I just don't know when dealing with a corrupted image.
> >
> > Thanks a lot for any hints,
> >
> > Alex
> >
> > --
> > Dr. Alexander G. Klenner
> > Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
> > Schloss Birlinghoven, D-53754 Sankt Augustin
> > Tel.: +49 - 2241 - 14 - 2736
> > E-mail: [email protected]
> > Internet: http://www.scai.fraunhofer.de
> >

BR
Andreas Lehmkühler

[1] http://pdfbox.apache.org/commandlineutilities/PDFToImage.html

Reply via email to