Hi, Maruan Sahyoun <[email protected]> hat am 8. April 2013 um 09:20 geschrieben: > Hi, > > unfortunately the attachment didn't make it through. Due to some security restrictions.
> Could you try the PDF in question using the command line app ExtractImage with > the -nonSeq parameter or use the following code I guess there is a missunderstanding. Please use PDFToImage to create one image for each page [1]. Provide us with any possible exception or log. > PDDocument pdDoc = PDDocument.loadNonSeq(…) > > The NonSequentialParser gives better results if the document has incremental > updates. > In addition it's not necessary to create a new PDDocument from the cosDoc as > parser.getDocument already passes a PDDocument …. +1, that's an old pattern and should be used any more. > BR from you neighborhood I'm not that far away either ;-) > Maruan Sahyoun > > Am 08.04.2013 um 08:52 schrieb Alexander Klenner > <[email protected]>: > > > Hi all, > > > > I frequently come across PDFs where the convertToImage() method is > > generating blank or partly blank images. One of those PDFs is attached to > > this mail. > > > > My code for processing: > > > > PDFParser parser; > > parser = new PDFParser(new FileInputStream(f)); > > parser.parse(); > > cosDoc = parser.getDocument(); > > > > pdDoc = new PDDocument(cosDoc); > > .. > > Iterator<PDPage> it = pdDoc.getDocumentCatalog().getAllPages().iterator(); > > PDPage page = it.next(); > > ... > > PDRectangle cropBox = page.findCropBox(); > > Dimension dimension = cropBox.createDimension(); > > ... > > BufferedImage img = page.convertToImage(BufferedImage.TYPE_INT_RGB, > > ImageParser.PARAM_DPI); > > > > > > I am using pdfbox-app-1.8.0.jar. > > > > So I have two questions: > > > > 1. Is there a different way to to extract the page as an image that I am not > > aware of to get the correct image? > > 2. Or is it possible to detect, that this page was not extracted correctly > > before or after the extraction? > > > > At the moment I just don't know when dealing with a corrupted image. > > > > Thanks a lot for any hints, > > > > Alex > > > > -- > > Dr. Alexander G. Klenner > > Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI) > > Schloss Birlinghoven, D-53754 Sankt Augustin > > Tel.: +49 - 2241 - 14 - 2736 > > E-mail: [email protected] > > Internet: http://www.scai.fraunhofer.de > > BR Andreas Lehmkühler [1] http://pdfbox.apache.org/commandlineutilities/PDFToImage.html

