Hi Alexander, you can ignore the info messages if the result you get is inline with your expectations. The info means that although PDFBox supports a fair amount of the PDF specification not all operators specified are currently supported. PDFBox handles that situation and continues processing the rest of the PDF. As long as that doesn't affect the results you are expecting you're fine.
BR Maruan Sahyoun Am 08.04.2013 um 10:17 schrieb Alexander Klenner <[email protected]>: > Hi Andreas, > > sorry I was busy uploading the PDFs and writing the mail, didn't see your > mail, but I figured PDFToImage might be the correct choice here ;). > > I do not get any exceptions but some info logs, which are: > > Apr 8, 2013 10:16:49 AM org.apache.pdfbox.util.PDFStreamEngine processOperator > INFO: unsupported/disabled operation: BX > Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator > INFO: unsupported/disabled operation: BDC > Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator > INFO: unsupported/disabled operation: BMC > Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator > INFO: unsupported/disabled operation: i > Apr 8, 2013 10:16:50 AM org.apache.pdfbox.util.PDFStreamEngine processOperator > INFO: unsupported/disabled operation: DP > Apr 8, 2013 10:16:51 AM org.apache.pdfbox.util.PDFStreamEngine processOperator > INFO: unsupported/disabled operation: EMC > Apr 8, 2013 10:16:52 AM org.apache.pdfbox.util.PDFStreamEngine processOperator > INFO: unsupported/disabled operation: EX > > > Those I get for every page in this document. > > Cheers, > > Alex > > -- > Dr. Alexander G. Klenner > Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI) > Schloss Birlinghoven, D-53754 Sankt Augustin > Tel.: +49 - 2241 - 14 - 2736 > E-mail: [email protected] > Internet: http://www.scai.fraunhofer.de > > > ----- Original Message ----- > From: "Andreas Lehmkühler" <[email protected]> > To: [email protected] > Sent: Monday, April 8, 2013 9:58:25 AM > Subject: Re: errors with PDPage.convertToImage() > > Hi, > > Maruan Sahyoun <[email protected]> hat am 8. April 2013 um 09:20 > geschrieben: >> Hi, >> >> unfortunately the attachment didn't make it through. > Due to some security restrictions. > >> Could you try the PDF in question using the command line app ExtractImage >> with >> the -nonSeq parameter or use the following code > I guess there is a missunderstanding. Please use PDFToImage to create one > image > for > each page [1]. Provide us with any possible exception or log. > >> PDDocument pdDoc = PDDocument.loadNonSeq(…) >> >> The NonSequentialParser gives better results if the document has incremental >> updates. >> In addition it's not necessary to create a new PDDocument from the cosDoc as >> parser.getDocument already passes a PDDocument …. > +1, that's an old pattern and should be used any more. > >> BR from you neighborhood > I'm not that far away either ;-) > >> Maruan Sahyoun >> >> Am 08.04.2013 um 08:52 schrieb Alexander Klenner >> <[email protected]>: >> >>> Hi all, >>> >>> I frequently come across PDFs where the convertToImage() method is >>> generating blank or partly blank images. One of those PDFs is attached to >>> this mail. >>> >>> My code for processing: >>> >>> PDFParser parser; >>> parser = new PDFParser(new FileInputStream(f)); >>> parser.parse(); >>> cosDoc = parser.getDocument(); >>> >>> pdDoc = new PDDocument(cosDoc); >>> .. >>> Iterator<PDPage> it = pdDoc.getDocumentCatalog().getAllPages().iterator(); >>> PDPage page = it.next(); >>> ... >>> PDRectangle cropBox = page.findCropBox(); >>> Dimension dimension = cropBox.createDimension(); >>> ... >>> BufferedImage img = page.convertToImage(BufferedImage.TYPE_INT_RGB, >>> ImageParser.PARAM_DPI); >>> >>> >>> I am using pdfbox-app-1.8.0.jar. >>> >>> So I have two questions: >>> >>> 1. Is there a different way to to extract the page as an image that I am not >>> aware of to get the correct image? >>> 2. Or is it possible to detect, that this page was not extracted correctly >>> before or after the extraction? >>> >>> At the moment I just don't know when dealing with a corrupted image. >>> >>> Thanks a lot for any hints, >>> >>> Alex >>> >>> -- >>> Dr. Alexander G. Klenner >>> Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI) >>> Schloss Birlinghoven, D-53754 Sankt Augustin >>> Tel.: +49 - 2241 - 14 - 2736 >>> E-mail: [email protected] >>> Internet: http://www.scai.fraunhofer.de >>> > > BR > Andreas Lehmkühler > > [1] http://pdfbox.apache.org/commandlineutilities/PDFToImage.html

