If the PDF contains incremental updates then there will be multiple %%EOF - 
that’s fine.

BR

Maruan

Am 20.10.2014 um 13:50 schrieb Vomlel Jan <[email protected]>:

> Hi Maruan,
> 
> I create patch for bug PDFBOX-2436.
> 
> After %%EOF it skips data to next object.
> 
> I don´t know, if such data are allowed by specification, but some czech 
> portal create them and acrobat have no problem with them.
> 
> I changed org.apache.pdfbox.pdfparser.PDFParser near line 584, branch 1.8. 
> Can you commit it and fix this bug?
> 
> 
>                            pdfSource.unread(eof.getBytes("ISO-8859-1"));
>                        }
>                    }
>                }
>                isEndOfFile = true;
> 
>                //PDFBOX-2436 - some files contain binary data after %%EOF.
>                skipToNextObj();
>            }
>        }
>        //we are going to parse an normal object
>        Else
> 
> Thank you, Jan
> 
> -----Original Message-----
> From: Vomlel Jan
> Sent: Friday, October 17, 2014 9:12 AM
> To: [email protected]; [email protected]
> Subject: RE: problem with pdf eof
> 
> I reported parsing error for load function:
> https://issues.apache.org/jira/browse/PDFBOX-2436
> Jan
> 
> -----Original Message-----
> From: Maruan Sahyoun [mailto:[email protected]]
> Sent: Thursday, October 16, 2014 8:23 PM
> To: [email protected]; [email protected]
> Subject: Re: problem with pdf eof
> 
> sorry if that has been unclear - as of now if you’d like to sign you have to 
> use load() loadNonSeq() is not an option!
> 
> For all other cases use loadNonSeq() and if that fails load() as a fallback.
> 
> We are working on getting the missing signing support into nonSeq() but that 
> will probably be after 2.0.
> 
> Now if you have parsing issues with load() please open an issue in Jira and 
> attach the PDFs together with code to reproduce it. Same if you have parsing 
> issues with loadNonSeq().
> 
> Of course if someone is willing to help getting that in … patches are welcome.
> 
> Maruan
> 
> Am 16.10.2014 um 20:13 schrieb Brzrk One <[email protected]>:
> 
>> I hear dual advice here...
>> - don't use NonSeq for signatures
>> - but use NonSeq for multiple EOFs
>> Files with both multiple EOFs and signatures will have problems...
>> unless you mean we should parse 2x?
>> 
>> On Thu, Oct 16, 2014 at 12:12 PM, Maruan Sahyoun
>> <[email protected]>
>> wrote:
>> 
>>> depends on the parser being used. NonSeq does follow the Xref
>>> information and handles multiple EOFs (incremental updates) when parsing.
>>> 
>>> BR
>>> Maruan
>>> 
>>> Am 16.10.2014 um 17:01 schrieb Brzrk One <[email protected]>:
>>> 
>>> I've noticed that when there are multiple EOFs in the file, PDFBox
>>> parsing is less reliable.
>>> 
>>> 
>>> On Thu, Oct 16, 2014 at 9:44 AM, Vomlel Jan <[email protected]> wrote:
>>> 
>>> When I use load insted of loadNoSeq, signatures are in this case  valid.
>>> 
>>> But for some documents load function doesnot read complete document.
>>> That is why I used loadNoSeq. Some signatures are then missing.
>>> 
>>> Viz:
>>> http://leteckaposta.cz/831516385
>>> h1.pdf - original file (signature and timestamp) h2.pdf - add first
>>> signature by pdfbox (timestamp is missing) h3.pdf - add second
>>> signature by pdfbox (timestamp and previous signature is missing)
>>> 
>>> Jan
>>> 
>>> -----Original Message-----
>>> From: Maruan Sahyoun [mailto:[email protected]]
>>> Sent: Thursday, October 16, 2014 2:37 PM
>>> To: [email protected]
>>> Subject: Re: problem with pdf eof
>>> 
>>> when signing please make sure that you load the pdf using
>>> PDDocument.load instead of PDDocument.loadNonSeq.
>>> 
>>> 
>>> Am 16.10.2014 um 11:57 schrieb Vomlel Jan <[email protected]>:
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Maruan Sahyoun [mailto:[email protected]]
>>> Sent: Thursday, October 16, 2014 11:55 AM
>>> To: [email protected]
>>> Subject: Re: problem with pdf eof
>>> 
>>> when you say invalid do you mean it’s corrupted or e.g. you get a
>>> 
>>> warning sign in Adobe Reader? Would you have a sample PDF?
>>> 
>>> 
>>> When you sign a document and sign it again the first signature points
>>> to
>>> 
>>> a different document revision as you have changed the documents
>>> content afterwards. So invalid in that context could mean that the
>>> warning you might be getting is only reflecting that fact. Would need
>>> to see the document to  understand what’s going on.
>>> 
>>> 
>>> BR
>>> 
>>> Maruan
>>> 
>>> Am 16.10.2014 um 11:48 schrieb Vomlel Jan <[email protected]>:
>>> 
>>> Hi Maruan and others,
>>> 
>>> I created signature and it seems OK.
>>> But when I create second signature (loadNonSeq, addSignature,
>>> 
>>> saveIncremental again), the first signature becomes invalid.
>>> 
>>> I think that there can be problem, that first page is updated
>>> (signatur
>>> 
>>> is invisible), but I dont understand it enough.
>>> 
>>> 
>>> Jan
>>> 
>>> -----Original Message-----
>>> From: Maruan Sahyoun [mailto:[email protected]]
>>> Sent: Monday, October 13, 2014 4:09 PM
>>> To: [email protected]
>>> Subject: Re: problem with pdf eof
>>> 
>>> Hi Jan,
>>> 
>>> there are sample in the examples package for various ways to sign a
>>> 
>>> document [1]. Signing a document needs incremental saving.
>>> 
>>> 
>>> OTOH choosing the right solution should not be made on the base if
>>> 
>>> there is a license fee or not.
>>> 
>>> 
>>> Maruan Sahyoun
>>> 
>>> [1]
>>> 
>>> 
>>> http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/
>>> apache/pdfbox/examples/signature/
>>> 
>>> 
>>> 
>>> Am 13.10.2014 um 16:02 schrieb Vomlel Jan <[email protected]>:
>>> 
>>> Hi Maruan (and others),
>>> 
>>> I would like to use pdfbox and bouncycastle for managing pdf
>>> 
>>> signatures. Parsing, validation, timestamping (PADES LTV) .
>>> 
>>> We used itext for it, but it is under commercial licence.
>>> Parsing signatures seems to be working (thanks to your advice). So I
>>> 
>>> will try to create timestamp.
>>> 
>>> Is it possible with pdfbox?  I found save method on PDDocument, but
>>> 
>>> Iˇm afraid, that it can change bite representation of pdf, and
>>> signatures become invalid. Is it true? What is right way to create
>>> signature or timestamp with pdfbox?
>>> 
>>> 
>>> Jan
>>> 
>>> 
>>> -----Original Message-----
>>> From: Maruan Sahyoun [mailto:[email protected]]
>>> Sent: Friday, October 10, 2014 10:44 AM
>>> To: [email protected]
>>> Subject: Re: problem with pdf eof
>>> 
>>> Hi Jan,
>>> 
>>> choosing the right technology is very important so I do understand
>>> 
>>> your concerns. I had to make such decision about using PDFBox in the
>>> past too.
>>> 
>>> It can
>>> If you have specific issues I can answer I’m happy to try to do so.
>>> As
>>> 
>>> a general statement PDFBox is used in production environments today
>>> (as an example we ourselves are using it for a banking customer to
>>> process account statements, an airline company to preprocess
>>> archiving documents and various other customers).
>>> 
>>> 
>>> PDFBox is continuously enhancing the parsing as we try to deal with
>>> 
>>> real world PDF files which are not always inline with the the PDF
>>> specification. Currently the best approach is to use
>>> PDDocument.loadNonSeq (which parses documents according to the Xref
>>> information) and in case of an exception PDDocument.load (which
>>> parses sequentially). The Apache Tika project, which uses PDFBox for
>>> parsing PDF’s, is running the parsing and text extraction against 50k
>>> PDFs being made available via http://digitalcorpora.org
>>> 
>>> 
>>> What is the application you would like to be using PDFBox for? Text
>>> 
>>> Extraction, image conversion …. - I might be able to give you more
>>> specific information for your use case.
>>> 
>>> 
>>> BR
>>> 
>>> Maruan
>>> 
>>> Am 10.10.2014 um 10:10 schrieb Vomlel Jan <[email protected]>:
>>> 
>>> Thank you Maruan, this function loads document.
>>> 
>>> I have read https://pdfbox.apache.org/ideas.html "Replace/Enhance
>>> 
>>> PDF parsing". I think correct parsing is very important, and I have
>>> some doubts, if I can use pdfbox in production. Can you say something
>>> to rest me :-).
>>> 
>>> 
>>> Jan
>>> 
>>> -----Original Message-----
>>> From: Maruan Sahyoun [mailto:[email protected]]
>>> Sent: Friday, October 10, 2014 9:25 AM
>>> To: [email protected]
>>> Subject: Re: problem with pdf eof
>>> 
>>> Hi
>>> 
>>> you can try PDDocument.loadNonSeq(InputStream is, null)
>>> 
>>> BR
>>> 
>>> Maruan
>>> 
>>> Am 10.10.2014 um 09:09 schrieb Vomlel Jan <[email protected]>:
>>> 
>>> Hello,
>>> I use PDFBox 1.8.7  PDDocument.load(InputStream is) method to parse
>>> 
>>> PDF document in attachement.
>>> 
>>> Method return without exception, but document model is incomplete.
>>> 
>>> Problem is in characters after EOF (ofset 22939):
>>> startxref
>>> 22449
>>> %%EOF
>>> @
>>> 16 0 obj
>>> <<
>>> /Type /Catalog
>>> 
>>> PDFBox create internal IOException and ignore it with comment:
>>>              /*
>>>               * PDF files may have random data after the EOF
>>> 
>>> marker. Ignore errors if
>>> 
>>>               * last object processed is EOF.
>>>               */
>>> 
>>> Is this PDF construction valid?
>>> Which parser in PDFBox is correct? I tried ConformingPDParser, but
>>> 
>>> another error occured.
>>> 
>>> 
>>> Jan
>>> 
>>> 
>>> 
>>> 
>>> Tento e-mail ani žádný z připojených souborů nejsou přijetím návrhu
>>> 
>>> na uzavření smlouvy, ledaže je to v nich výslovně uvedeno. Pokud tomu
>>> tak není, nelze je považovat za jednání, které by zakládalo jakékoliv
>>> nároky vůči společnosti AiP Safe. Tento e-mail je určen pouze
>>> uvedenému příjemci a dalším osobám, které jsou jmenovitě uvedeny jako
>>> příjemci, a jeho obsah, včetně obsahu všech připojených souborů, je
>>> důvěrný. Jestliže nejste oprávněný příjemce, zdržte se, prosím,
>>> jakékoliv formy zveřejnění, reprodukce, kopírování, distribuce nebo
>>> šíření jeho obsahu, včetně obsahu všech připojených souborů. Pokud
>>> jste obdržel tento e-mail omylem, oznamte to, prosím, neprodleně jeho
>>> odesilateli a e-mail, včetně všech připojených souborů, vymažte.
>>> Všechny e maily adresované, přijímané nebo posílané AiP Safe s.r.o.
>>> nebo zaměstnanci AiP Safe s.r.o. jsou považovány za zásadně pracovní
>>> e-maily. V souladu s tím odesilatel nebo příjemce těchto e mailů
>>> souhlasí, že mohou být čteny jinými zaměstnanci AiP Safe s.r.o., než
>>> je daný příjemce nebo odesilatel, proto aby byla zajištěna kontinuita 
>>> pracovních aktivit a byla umožněna jejich kontrola..
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> ________________________________
> 
> Tento e-mail ani žádný z připojených souborů nejsou přijetím návrhu na 
> uzavření smlouvy, ledaže je to v nich výslovně uvedeno. Pokud tomu tak není, 
> nelze je považovat za jednání, které by zakládalo jakékoliv nároky vůči 
> společnosti AiP Safe. Tento e-mail je určen pouze uvedenému příjemci a dalším 
> osobám, které jsou jmenovitě uvedeny jako příjemci, a jeho obsah, včetně 
> obsahu všech připojených souborů, je důvěrný. Jestliže nejste oprávněný 
> příjemce, zdržte se, prosím, jakékoliv formy zveřejnění, reprodukce, 
> kopírování, distribuce nebo šíření jeho obsahu, včetně obsahu všech 
> připojených souborů. Pokud jste obdržel tento e-mail omylem, oznamte to, 
> prosím, neprodleně jeho odesilateli a e-mail, včetně všech připojených 
> souborů, vymažte. Všechny e maily adresované, přijímané nebo posílané AiP 
> Safe s.r.o. nebo zaměstnanci AiP Safe s.r.o. jsou považovány za zásadně 
> pracovní e-maily. V souladu s tím odesilatel nebo příjemce těchto e mailů 
> souhlasí, že mohou být čteny jinými zaměstnanci AiP Safe s.r.o., než je daný 
> příjemce nebo odesilatel, proto aby byla zajištěna kontinuita pracovních 
> aktivit a byla umožněna jejich kontrola..

Reply via email to