Hi, Gesendet: Di, 15. Dez 2009 Von: George Van Treeck<[email protected]>
> I ran into the exception below when using an older 0.8 version. So, I did a > build using HEAD from subversion. And the exception persists. The following > is output from a little web crawler I wrote. > > ERROR: Unable to load PDF document: > http://www.polaroid.com/media/document/a932manualEN20091019.pdf > java.io.IOException: Unknown xobject subtype 'PS' > at > org.apache.pdfbox.pdmodel.graphics.xobject.PDXObject.createXObject(PDXObject > .java:165) > at org.apache.pdfbox.pdmodel.PDResources.getXObjects(PDResources.java:161) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java > :226) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:20 > 6) > at > org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367) > > at > org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291 > ) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247) > at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180) > at webcrawler.WebCrawler.getContent(WebCrawler.java:1444) > PDFBox doesn't support that kin of subtype for XObjects. Refering to the pdf reference manual (v1.7 chapter 4.7.1 PostScript XObjects ) it's rarely used and shouldn't have any effect when viewing the document. It could only be used when printing on a ps enabled printer. This feature is likely to be removed from PDF in a future version. PDFBox should ignore those PS XObjects in future. > -George > BR Andreas Lehmkühler

