sorry if that has been unclear - as of now if you’d like to sign you have to use load() loadNonSeq() is not an option!
For all other cases use loadNonSeq() and if that fails load() as a fallback. We are working on getting the missing signing support into nonSeq() but that will probably be after 2.0. Now if you have parsing issues with load() please open an issue in Jira and attach the PDFs together with code to reproduce it. Same if you have parsing issues with loadNonSeq(). Of course if someone is willing to help getting that in … patches are welcome. Maruan Am 16.10.2014 um 20:13 schrieb Brzrk One <[email protected]>: > I hear dual advice here... > - don't use NonSeq for signatures > - but use NonSeq for multiple EOFs > Files with both multiple EOFs and signatures will have problems... > unless you mean we should parse 2x? > > On Thu, Oct 16, 2014 at 12:12 PM, Maruan Sahyoun <[email protected]> > wrote: > >> depends on the parser being used. NonSeq does follow the Xref information >> and handles multiple EOFs (incremental updates) when parsing. >> >> BR >> Maruan >> >> Am 16.10.2014 um 17:01 schrieb Brzrk One <[email protected]>: >> >> I've noticed that when there are multiple EOFs in the file, PDFBox parsing >> is less reliable. >> >> >> On Thu, Oct 16, 2014 at 9:44 AM, Vomlel Jan <[email protected]> wrote: >> >> When I use load insted of loadNoSeq, signatures are in this case valid. >> >> But for some documents load function doesnot read complete document. That >> is why I used loadNoSeq. Some signatures are then missing. >> >> Viz: >> http://leteckaposta.cz/831516385 >> h1.pdf - original file (signature and timestamp) >> h2.pdf - add first signature by pdfbox (timestamp is missing) >> h3.pdf - add second signature by pdfbox (timestamp and previous signature >> is missing) >> >> Jan >> >> -----Original Message----- >> From: Maruan Sahyoun [mailto:[email protected]] >> Sent: Thursday, October 16, 2014 2:37 PM >> To: [email protected] >> Subject: Re: problem with pdf eof >> >> when signing please make sure that you load the pdf using PDDocument.load >> instead of PDDocument.loadNonSeq. >> >> >> Am 16.10.2014 um 11:57 schrieb Vomlel Jan <[email protected]>: >> >> >> >> -----Original Message----- >> From: Maruan Sahyoun [mailto:[email protected]] >> Sent: Thursday, October 16, 2014 11:55 AM >> To: [email protected] >> Subject: Re: problem with pdf eof >> >> when you say invalid do you mean it’s corrupted or e.g. you get a >> >> warning sign in Adobe Reader? Would you have a sample PDF? >> >> >> When you sign a document and sign it again the first signature points to >> >> a different document revision as you have changed the documents content >> afterwards. So invalid in that context could mean that the warning you >> might be getting is only reflecting that fact. Would need to see the >> document to understand what’s going on. >> >> >> BR >> >> Maruan >> >> Am 16.10.2014 um 11:48 schrieb Vomlel Jan <[email protected]>: >> >> Hi Maruan and others, >> >> I created signature and it seems OK. >> But when I create second signature (loadNonSeq, addSignature, >> >> saveIncremental again), the first signature becomes invalid. >> >> I think that there can be problem, that first page is updated (signatur >> >> is invisible), but I dont understand it enough. >> >> >> Jan >> >> -----Original Message----- >> From: Maruan Sahyoun [mailto:[email protected]] >> Sent: Monday, October 13, 2014 4:09 PM >> To: [email protected] >> Subject: Re: problem with pdf eof >> >> Hi Jan, >> >> there are sample in the examples package for various ways to sign a >> >> document [1]. Signing a document needs incremental saving. >> >> >> OTOH choosing the right solution should not be made on the base if >> >> there is a license fee or not. >> >> >> Maruan Sahyoun >> >> [1] >> >> >> http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/signature/ >> >> >> >> Am 13.10.2014 um 16:02 schrieb Vomlel Jan <[email protected]>: >> >> Hi Maruan (and others), >> >> I would like to use pdfbox and bouncycastle for managing pdf >> >> signatures. Parsing, validation, timestamping (PADES LTV) . >> >> We used itext for it, but it is under commercial licence. >> Parsing signatures seems to be working (thanks to your advice). So I >> >> will try to create timestamp. >> >> Is it possible with pdfbox? I found save method on PDDocument, but >> >> Iˇm afraid, that it can change bite representation of pdf, and signatures >> become invalid. Is it true? What is right way to create signature or >> timestamp with pdfbox? >> >> >> Jan >> >> >> -----Original Message----- >> From: Maruan Sahyoun [mailto:[email protected]] >> Sent: Friday, October 10, 2014 10:44 AM >> To: [email protected] >> Subject: Re: problem with pdf eof >> >> Hi Jan, >> >> choosing the right technology is very important so I do understand >> >> your concerns. I had to make such decision about using PDFBox in the past >> too. >> >> It can >> If you have specific issues I can answer I’m happy to try to do so. As >> >> a general statement PDFBox is used in production environments today (as an >> example we ourselves are using it for a banking customer to process account >> statements, an airline company to preprocess archiving documents and >> various other customers). >> >> >> PDFBox is continuously enhancing the parsing as we try to deal with >> >> real world PDF files which are not always inline with the the PDF >> specification. Currently the best approach is to use PDDocument.loadNonSeq >> (which parses documents according to the Xref information) and in case of >> an exception PDDocument.load (which parses sequentially). The Apache Tika >> project, which uses PDFBox for parsing PDF’s, is running the parsing and >> text extraction against 50k PDFs being made available via >> http://digitalcorpora.org >> >> >> What is the application you would like to be using PDFBox for? Text >> >> Extraction, image conversion …. - I might be able to give you more specific >> information for your use case. >> >> >> BR >> >> Maruan >> >> Am 10.10.2014 um 10:10 schrieb Vomlel Jan <[email protected]>: >> >> Thank you Maruan, this function loads document. >> >> I have read https://pdfbox.apache.org/ideas.html "Replace/Enhance >> >> PDF parsing". I think correct parsing is very important, and I have some >> doubts, if I can use pdfbox in production. Can you say something to rest me >> :-). >> >> >> Jan >> >> -----Original Message----- >> From: Maruan Sahyoun [mailto:[email protected]] >> Sent: Friday, October 10, 2014 9:25 AM >> To: [email protected] >> Subject: Re: problem with pdf eof >> >> Hi >> >> you can try PDDocument.loadNonSeq(InputStream is, null) >> >> BR >> >> Maruan >> >> Am 10.10.2014 um 09:09 schrieb Vomlel Jan <[email protected]>: >> >> Hello, >> I use PDFBox 1.8.7 PDDocument.load(InputStream is) method to parse >> >> PDF document in attachement. >> >> Method return without exception, but document model is incomplete. >> >> Problem is in characters after EOF (ofset 22939): >> startxref >> 22449 >> %%EOF >> @ >> 16 0 obj >> << >> /Type /Catalog >> >> PDFBox create internal IOException and ignore it with comment: >> /* >> * PDF files may have random data after the EOF >> >> marker. Ignore errors if >> >> * last object processed is EOF. >> */ >> >> Is this PDF construction valid? >> Which parser in PDFBox is correct? I tried ConformingPDParser, but >> >> another error occured. >> >> >> Jan >> >> >> >> >> Tento e-mail ani žádný z připojených souborů nejsou přijetím návrhu >> >> na uzavření smlouvy, ledaže je to v nich výslovně uvedeno. Pokud tomu tak >> není, nelze je považovat za jednání, které by zakládalo jakékoliv nároky >> vůči společnosti AiP Safe. Tento e-mail je určen pouze uvedenému příjemci a >> dalším osobám, které jsou jmenovitě uvedeny jako příjemci, a jeho obsah, >> včetně obsahu všech připojených souborů, je důvěrný. Jestliže nejste >> oprávněný příjemce, zdržte se, prosím, jakékoliv formy zveřejnění, >> reprodukce, kopírování, distribuce nebo šíření jeho obsahu, včetně obsahu >> všech připojených souborů. Pokud jste obdržel tento e-mail omylem, oznamte >> to, prosím, neprodleně jeho odesilateli a e-mail, včetně všech připojených >> souborů, vymažte. Všechny e maily adresované, přijímané nebo posílané AiP >> Safe s.r.o. nebo zaměstnanci AiP Safe s.r.o. jsou považovány za zásadně >> pracovní e-maily. V souladu s tím odesilatel nebo příjemce těchto e mailů >> souhlasí, že mohou být čteny jinými zaměstnanci AiP Safe s.r.o., než je >> daný příjemce nebo odesilatel, proto aby byla zajištěna kontinuita >> pracovních aktivit a byla umožněna jejich kontrola.. >> >> >> >> >> >> >> >> >>

