Re: Error on PDDocument.load

John Hewson Wed, 11 Feb 2015 15:29:24 -0800

Hmm, if it doesn’t render in Acrobat we probably shouldn’t render it either...


-- John

> On 11 Feb 2015, at 14:16, Tilman Hausherr <[email protected]> wrote:
> 
> I wasn't able to create a non confidential version of the file that works 
> with Adobe Reader. But here's an issue and a proposed patch.
> 
> https://issues.apache.org/jira/browse/PDFBOX-2679
> 
> Tilman
> 
> Am 11.02.2015 um 18:54 schrieb Tilman Hausherr:
>> No, his file is confidential.
>> 
>> However we might create a non confidential file that has the same error.
>> 
>> Tilman
>> 
>> Am 11.02.2015 um 18:40 schrieb John Hewson:
>>> Can we get a JIRA issue open for this, preferably with the file attached?
>>> 
>>> -- John
>>> 
>>>> On 11 Feb 2015, at 00:29, Tilman Hausherr <[email protected]> wrote:
>>>> 
>>>> Yes, they made hacks. So did we, for many types of malformed files. Please 
>>>> send the file also to Andreas, unless you already did, he did many 
>>>> workarounds for malformed files.
>>>> 
>>>> Tilman
>>>> 
>>>>> Am 11.02.2015 um 09:05 schrieb Kevin Morin:
>>>>> Ok. Why other softwares are able to open it (like xpf)? I guess they made 
>>>>> a hack to fix this? Are you going to do something too?
>>>>> 
>>>>> Thanks
>>>>> BR
>>>>> 
>>>>> Kevin
>>>>> 
>>>>>> On 11/02/2015 08:53, Tilman Hausherr wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I can reproduce the error. Your file is malformed. Please open it with
>>>>>> NOTEPAD++ and go to the end:
>>>>>> 
>>>>>> xref
>>>>>> 1 7
>>>>>> 0000000000 65535 f
>>>>>> 0000000009 00000 n
>>>>>> 0000358745 00000 n
>>>>>> 0000358842 00000 n
>>>>>> 0000359029 00000 n
>>>>>> 0000359087 00000 n
>>>>>> 0000359138 00000 n
>>>>>> trailer
>>>>>> 
>>>>>> The first number (1) means the number of the first object. So it would
>>>>>> be 1. The second number(7) is the size of the table. The number 1 is
>>>>>> incorrect, it should be 0, because "0000000000 65535 f" is the dummy
>>>>>> object 0. Press CTRL-G and enter the offsets (e.g. 9, 45, 358745, ...)
>>>>>> and you will see what I mean.
>>>>>> 
>>>>>> From the pdf spec:
>>>>>> 
>>>>>> The free entries in the cross-reference table form a linked list, with
>>>>>> each free entry containing the object number of the next. The first
>>>>>> entry in the table (object number 0) is always free and has a generation
>>>>>> number of 65,535; it is the head of the linked list of free objects
>>>>>> 
>>>>>> Tilman
>>>>>> 
>>>>>> 
>>>>>>> Am 11.02.2015 um 08:21 schrieb Kevin Morin:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I am sorry, it seems that I did not send you the right file...
>>>>>>> Actually, I was testing the wrong file on linux from the begining
>>>>>>> also. The file is displaying blank also on linux and on java 7 or 8...
>>>>>>> Here is the right file.
>>>>>>> 
>>>>>>> I am sorry to make you work for nothing...
>>>>>>> 
>>>>>>> BR
>>>>>>> 
>>>>>>> Kevin
>>>>>>> 
>>>>>>> 
>>>>>>>> On 10/02/2015 21:32, Tilman Hausherr wrote:
>>>>>>>> So we e-mailed and the result is
>>>>>>>> - you're really working on W2008 with the file that you sent me
>>>>>>>> - you get the same error on W2008 with the app (and I don't)
>>>>>>>> 
>>>>>>>> I have analysed that file and did some debug traces. If loading that on
>>>>>>>> W2008 is a no-no, you'd have to build from source and I'll tell you the
>>>>>>>> changes.
>>>>>>>> 
>>>>>>>> http://home.snafu.de/tilman/tmp/pdfbox-app-2.0.0-TILMAN.jar
>>>>>>>> 
>>>>>>>> Don't use that version for production. It contains lots of stuff for my
>>>>>>>> own tests. Only use it for this problem. Here's the output that you
>>>>>>>> should get:
>>>>>>>> 
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.COSParser
>>>>>>>> parseXrefStream
>>>>>>>> INFORMATION: parseXrefStream: objByteOffset = 116
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 7 0 obj at offset: 16
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 8 0 obj at offset: 573
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 9 0 obj at offset: 633
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 10 0 obj at offset: 817
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 11 0 obj at offset: 914
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 12 0 obj at offset: 116
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 13 0 obj at offset: 436
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.COSParser
>>>>>>>> parseXrefStream
>>>>>>>> INFORMATION: parseXrefStream: objByteOffset = 363505
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 1 0 obj at offset: 359638
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 2 0 obj at offset: 363167
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 3 0 obj at offset: 363307
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 4 0 obj at offset: 363505
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 5 stmnr: 2
>>>>>>>> Feb 10, 2015 9:27:18 PM org.apache.pdfbox.pdfparser.PDFXrefStreamParser
>>>>>>>> parse
>>>>>>>> INFORMATION: PDFXrefStreamParser: 6 stmnr: 3
>>>>>>>> 
>>>>>>>> What I wonder is if the offsets will be the same.
>>>>>>>> 
>>>>>>>> Tilman
>>>>>>>> 
>>>>>>>> PS: Sorry I usually can't help during EU business hours. Day job :-)
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Am 09.02.2015 um 11:26 schrieb Kevin Morin:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I will probably have to migrate to java 8 because of a bug in java 7
>>>>>>>>> which throws an error when rendering a certain type of PDF (cf thread
>>>>>>>>> Error on PDFRenderer.renderImage (PDFBox 2.0)). Could someone please
>>>>>>>>> check why it is not working on Windows Server 2008 R2 Standard? If you
>>>>>>>>> do not have this OS, tell me what I can do to help you.
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> BR
>>>>>>>>> 
>>>>>>>>> Kevin
>>>>>>>>> 
>>>>>>>>>> On 21/01/2015 12:26, Andreas Lehmkühler wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>>> Kevin Morin <[email protected]> hat am 21. Januar 2015 um 12:14
>>>>>>>>>>> geschrieben:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I thought I was running java 7 but it's java 8... I tried with java 
>>>>>>>>>>> 7
>>>>>>>>>>> and it works. I do not need it to work with java 8, java 7 is ok for
>>>>>>>>>>> me.
>>>>>>>>>> It works for me using java 8 on win7 and linux as well. I guess, the
>>>>>>>>>> issue has
>>>>>>>>>> to be something else....
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> BR
>>>>>>>>>> Andreas Lehmkühler
>>>>>>>>>> 
>>>>>>>>>>> Thanks for your help and for all your work.
>>>>>>>>>>> 
>>>>>>>>>>> Kevin
>>>>>>>>>>> 
>>>>>>>>>>>> On 21/01/2015 11:54, Maruan Sahyoun wrote:
>>>>>>>>>>>> Hi Kevin
>>>>>>>>>>>> 
>>>>>>>>>>>> works for me - what's your Java Version?
>>>>>>>>>>>> 
>>>>>>>>>>>> BR
>>>>>>>>>>>> Maruan
>>>>>>>>>>>> 
>>>>>>>>>>>>> Am 21.01.2015 um 11:24 schrieb Kevin Morin <[email protected]>:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> it does not work with PDFToImage either, I still get a blank
>>>>>>>>>>>>> image. Plus, I
>>>>>>>>>>>>> did not set the nonSeq option however it seems to be using the non
>>>>>>>>>>>>> sequential parser. And I have the following traces:
>>>>>>>>>>>>> janv. 21, 2015 11:20:02 AM
>>>>>>>>>>>>> org.apache.pdfbox.pdfparser.NonSequentialPDFParser ch
>>>>>>>>>>>>> eckXrefOffsets
>>>>>>>>>>>>> GRAVE: Can't find the object 7 0 (origin offset 359138)
>>>>>>>>>>>>> janv. 21, 2015 11:20:03 AM
>>>>>>>>>>>>> org.apache.pdfbox.contentstream.PDFStreamEngine
>>>>>>>>>>>>> opera
>>>>>>>>>>>>> torException
>>>>>>>>>>>>> GRAVE: Missing XObject: Im1
>>>>>>>>>>>>> 
>>>>>>>>>>>>> BR
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Kevin
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 21/01/2015 11:11, Maruan Sahyoun wrote:
>>>>>>>>>>>>>> Hi Kevin,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> you can test with the PDFToImage command [1] available in from 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> pdfbox-app [2] if the issue happens there. The source for
>>>>>>>>>>>>>> PDFToImage is
>>>>>>>>>>>>>> available in the tools section of the SVN repo or online viewable
>>>>>>>>>>>>>> [3].
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> BR
>>>>>>>>>>>>>> Maruan
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [1] https://pdfbox.apache.org/1.8/commandline.html#pdfToImage
>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>> http://svn.apache.org/viewvc/pdfbox/trunk/tools/src/main/java/org/apache/pdfbox/tools/PDFToImage.java?view=markup
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Am 21.01.2015 um 11:00 schrieb Kevin Morin 
>>>>>>>>>>>>>>> <[email protected]>:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Andreas,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I am using the latest snapshot available on the maven
>>>>>>>>>>>>>>> repository. And I
>>>>>>>>>>>>>>> am running my app on Windows Server 2008 R2 Standard and it does
>>>>>>>>>>>>>>> not work
>>>>>>>>>>>>>>> (white page). Could send me the code or a jar to test on this
>>>>>>>>>>>>>>> server to
>>>>>>>>>>>>>>> check if it does not come from my code?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> BR
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Kevin
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 19/01/2015 19:13, Andreas Lehmkuehler wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Am 19.01.2015 um 12:45 schrieb Kevin Morin:
>>>>>>>>>>>>>>>>> Actually, the issue is not only these traces. The real issue
>>>>>>>>>>>>>>>>> is that I
>>>>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>> blank image when I try to render the document.
>>>>>>>>>>>>>>>> I've checked your PDF and everything renders fine. I've tried
>>>>>>>>>>>>>>>> SNAPSHOT-891 on linux (running java 1.8, 1.7 and 1.6) and the
>>>>>>>>>>>>>>>> latest
>>>>>>>>>>>>>>>> SNAPSHOT-947 on win7 running java 1.7
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Maybe your SNAPSHOT is outdated?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> BR
>>>>>>>>>>>>>>>> Andreas Lehmkühler
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 19/01/2015 12:39, Kevin Morin wrote:
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I am using the 2.0 snapshot version to images of pdfs, but on
>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>> documents, I have the following error when I call
>>>>>>>>>>>>>>>>>> PDDocument.load(file):
>>>>>>>>>>>>>>>>>> 2015/01/19 12:32:48 ERROR
>>>>>>>>>>>>>>>>>> (org.apache.pdfbox.pdfparser.NonSequentialPDFParser:1864) -
>>>>>>>>>>>>>>>>>> Can't find
>>>>>>>>>>>>>>>>>> the object 7 0 (origin offset 359138)
>>>>>>>>>>>>>>>>>> 2015/01/19 12:32:48 ERROR
>>>>>>>>>>>>>>>>>> (org.apache.pdfbox.contentstream.PDFStreamEngine:840) -
>>>>>>>>>>>>>>>>>> Missing
>>>>>>>>>>>>>>>>>> XObject:
>>>>>>>>>>>>>>>>>> Im1
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I first had it a few days ago (I did not report it, shame on
>>>>>>>>>>>>>>>>>> me) but
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> error did not occur when I called the loadLegacy method on
>>>>>>>>>>>>>>>>>> PDDocument.
>>>>>>>>>>>>>>>>>> But the loadLegacy method is not available anymore...
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The issue happens on Windows (works fine on Debian).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks fo your help
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Kevin
>>>>>>>>> 
>>>>>>>>> --------------------------------------------------------------------- 
>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>> 
>>>>>>>> --------------------------------------------------------------------- 
>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --------------------------------------------------------------------- 
>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>> For additional commands, e-mail: [email protected]
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

Re: Error on PDDocument.load

Reply via email to