Hi,

unfortunately the attachment didn't make it through.

Could you try the PDF in question using the command line app ExtractImage with 
the -nonSeq  parameter or use the following code

PDDocument pdDoc = PDDocument.loadNonSeq(…)

The NonSequentialParser gives better results if the document has incremental 
updates. In addition it's not necessary to create a new PDDocument from the 
cosDoc as parser.getDocument already passes a PDDocument ….

BR from you neighborhood


Maruan Sahyoun

Am 08.04.2013 um 08:52 schrieb Alexander Klenner 
<[email protected]>:

> Hi all,
> 
> I frequently come across PDFs where the convertToImage() method is generating 
> blank or partly blank images. One of those PDFs is attached to this mail. 
> 
> My code for processing: 
> 
> PDFParser parser;
> parser = new PDFParser(new FileInputStream(f));
> parser.parse();
> cosDoc = parser.getDocument();
> 
> pdDoc = new PDDocument(cosDoc);
> ..
> Iterator<PDPage> it = pdDoc.getDocumentCatalog().getAllPages().iterator();
> PDPage page = it.next();
> ...
> PDRectangle cropBox = page.findCropBox();
> Dimension dimension = cropBox.createDimension();
> ...
> BufferedImage img = page.convertToImage(BufferedImage.TYPE_INT_RGB, 
> ImageParser.PARAM_DPI);
> 
> 
> I am using pdfbox-app-1.8.0.jar.
> 
> So I have two questions: 
> 
> 1. Is there a different way to to extract the page as an image that I am not 
> aware of to get the correct image? 
> 2. Or is it possible to detect, that this page was not extracted correctly 
> before or after the extraction?
> 
> At the moment I just don't know when dealing with a corrupted image.
> 
> Thanks a lot for any hints,
> 
> Alex
> 
> --
> Dr. Alexander G. Klenner
> Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
> Schloss Birlinghoven, D-53754 Sankt Augustin
> Tel.: +49 - 2241 - 14 - 2736
> E-mail: [email protected]
> Internet: http://www.scai.fraunhofer.de
> 

Reply via email to