depends on the parser being used. NonSeq does follow the Xref information and handles multiple EOFs (incremental updates) when parsing.
BR Maruan Am 16.10.2014 um 17:01 schrieb Brzrk One <[email protected]>: > I've noticed that when there are multiple EOFs in the file, PDFBox parsing > is less reliable. > > On Thu, Oct 16, 2014 at 9:44 AM, Vomlel Jan <[email protected]> wrote: > >> When I use load insted of loadNoSeq, signatures are in this case valid. >> >> But for some documents load function doesnot read complete document. That >> is why I used loadNoSeq. Some signatures are then missing. >> >> Viz: >> http://leteckaposta.cz/831516385 >> h1.pdf - original file (signature and timestamp) >> h2.pdf - add first signature by pdfbox (timestamp is missing) >> h3.pdf - add second signature by pdfbox (timestamp and previous signature >> is missing) >> >> Jan >> >> -----Original Message----- >> From: Maruan Sahyoun [mailto:[email protected]] >> Sent: Thursday, October 16, 2014 2:37 PM >> To: [email protected] >> Subject: Re: problem with pdf eof >> >> when signing please make sure that you load the pdf using PDDocument.load >> instead of PDDocument.loadNonSeq. >> >> >> Am 16.10.2014 um 11:57 schrieb Vomlel Jan <[email protected]>: >> >>> >>> >>> -----Original Message----- >>> From: Maruan Sahyoun [mailto:[email protected]] >>> Sent: Thursday, October 16, 2014 11:55 AM >>> To: [email protected] >>> Subject: Re: problem with pdf eof >>> >>> when you say invalid do you mean it’s corrupted or e.g. you get a >> warning sign in Adobe Reader? Would you have a sample PDF? >>> >>> When you sign a document and sign it again the first signature points to >> a different document revision as you have changed the documents content >> afterwards. So invalid in that context could mean that the warning you >> might be getting is only reflecting that fact. Would need to see the >> document to understand what’s going on. >>> >>> BR >>> >>> Maruan >>> >>> Am 16.10.2014 um 11:48 schrieb Vomlel Jan <[email protected]>: >>> >>>> Hi Maruan and others, >>>> >>>> I created signature and it seems OK. >>>> But when I create second signature (loadNonSeq, addSignature, >> saveIncremental again), the first signature becomes invalid. >>>> I think that there can be problem, that first page is updated (signatur >> is invisible), but I dont understand it enough. >>>> >>>> Jan >>>> >>>> -----Original Message----- >>>> From: Maruan Sahyoun [mailto:[email protected]] >>>> Sent: Monday, October 13, 2014 4:09 PM >>>> To: [email protected] >>>> Subject: Re: problem with pdf eof >>>> >>>> Hi Jan, >>>> >>>> there are sample in the examples package for various ways to sign a >> document [1]. Signing a document needs incremental saving. >>>> >>>> OTOH choosing the right solution should not be made on the base if >> there is a license fee or not. >>>> >>>> Maruan Sahyoun >>>> >>>> [1] >> http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/signature/ >>>> >>>> >>>> Am 13.10.2014 um 16:02 schrieb Vomlel Jan <[email protected]>: >>>> >>>>> Hi Maruan (and others), >>>>> >>>>> I would like to use pdfbox and bouncycastle for managing pdf >> signatures. Parsing, validation, timestamping (PADES LTV) . >>>>> We used itext for it, but it is under commercial licence. >>>>> Parsing signatures seems to be working (thanks to your advice). So I >> will try to create timestamp. >>>>> Is it possible with pdfbox? I found save method on PDDocument, but >> Iˇm afraid, that it can change bite representation of pdf, and signatures >> become invalid. Is it true? What is right way to create signature or >> timestamp with pdfbox? >>>>> >>>>> Jan >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Maruan Sahyoun [mailto:[email protected]] >>>>> Sent: Friday, October 10, 2014 10:44 AM >>>>> To: [email protected] >>>>> Subject: Re: problem with pdf eof >>>>> >>>>> Hi Jan, >>>>> >>>>> choosing the right technology is very important so I do understand >> your concerns. I had to make such decision about using PDFBox in the past >> too. >>>>> It can >>>>> If you have specific issues I can answer I’m happy to try to do so. As >> a general statement PDFBox is used in production environments today (as an >> example we ourselves are using it for a banking customer to process account >> statements, an airline company to preprocess archiving documents and >> various other customers). >>>>> >>>>> PDFBox is continuously enhancing the parsing as we try to deal with >> real world PDF files which are not always inline with the the PDF >> specification. Currently the best approach is to use PDDocument.loadNonSeq >> (which parses documents according to the Xref information) and in case of >> an exception PDDocument.load (which parses sequentially). The Apache Tika >> project, which uses PDFBox for parsing PDF’s, is running the parsing and >> text extraction against 50k PDFs being made available via >> http://digitalcorpora.org >>>>> >>>>> What is the application you would like to be using PDFBox for? Text >> Extraction, image conversion …. - I might be able to give you more specific >> information for your use case. >>>>> >>>>> BR >>>>> >>>>> Maruan >>>>> >>>>> Am 10.10.2014 um 10:10 schrieb Vomlel Jan <[email protected]>: >>>>> >>>>>> Thank you Maruan, this function loads document. >>>>>> >>>>>> I have read https://pdfbox.apache.org/ideas.html "Replace/Enhance >> PDF parsing". I think correct parsing is very important, and I have some >> doubts, if I can use pdfbox in production. Can you say something to rest me >> :-). >>>>>> >>>>>> Jan >>>>>> >>>>>> -----Original Message----- >>>>>> From: Maruan Sahyoun [mailto:[email protected]] >>>>>> Sent: Friday, October 10, 2014 9:25 AM >>>>>> To: [email protected] >>>>>> Subject: Re: problem with pdf eof >>>>>> >>>>>> Hi >>>>>> >>>>>> you can try PDDocument.loadNonSeq(InputStream is, null) >>>>>> >>>>>> BR >>>>>> >>>>>> Maruan >>>>>> >>>>>> Am 10.10.2014 um 09:09 schrieb Vomlel Jan <[email protected]>: >>>>>> >>>>>>> Hello, >>>>>>> I use PDFBox 1.8.7 PDDocument.load(InputStream is) method to parse >> PDF document in attachement. >>>>>>> Method return without exception, but document model is incomplete. >>>>>>> >>>>>>> Problem is in characters after EOF (ofset 22939): >>>>>>> startxref >>>>>>> 22449 >>>>>>> %%EOF >>>>>>> @ >>>>>>> 16 0 obj >>>>>>> << >>>>>>> /Type /Catalog >>>>>>> >>>>>>> PDFBox create internal IOException and ignore it with comment: >>>>>>> /* >>>>>>> * PDF files may have random data after the EOF >> marker. Ignore errors if >>>>>>> * last object processed is EOF. >>>>>>> */ >>>>>>> >>>>>>> Is this PDF construction valid? >>>>>>> Which parser in PDFBox is correct? I tried ConformingPDParser, but >> another error occured. >>>>>>> >>>>>>> Jan >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Tento e-mail ani žádný z připojených souborů nejsou přijetím návrhu >> na uzavření smlouvy, ledaže je to v nich výslovně uvedeno. Pokud tomu tak >> není, nelze je považovat za jednání, které by zakládalo jakékoliv nároky >> vůči společnosti AiP Safe. Tento e-mail je určen pouze uvedenému příjemci a >> dalším osobám, které jsou jmenovitě uvedeny jako příjemci, a jeho obsah, >> včetně obsahu všech připojených souborů, je důvěrný. Jestliže nejste >> oprávněný příjemce, zdržte se, prosím, jakékoliv formy zveřejnění, >> reprodukce, kopírování, distribuce nebo šíření jeho obsahu, včetně obsahu >> všech připojených souborů. Pokud jste obdržel tento e-mail omylem, oznamte >> to, prosím, neprodleně jeho odesilateli a e-mail, včetně všech připojených >> souborů, vymažte. Všechny e maily adresované, přijímané nebo posílané AiP >> Safe s.r.o. nebo zaměstnanci AiP Safe s.r.o. jsou považovány za zásadně >> pracovní e-maily. V souladu s tím odesilatel nebo příjemce těchto e mailů >> souhlasí, že mohou být čteny jinými zaměstnanci AiP Safe s.r.o., než je >> daný příjemce nebo odesilatel, proto aby byla zajištěna kontinuita >> pracovních aktivit a byla umožněna jejich kontrola.. >>>>>> >>>>> >>>> >>> >> >>

