I hear dual advice here... - don't use NonSeq for signatures - but use NonSeq for multiple EOFs Files with both multiple EOFs and signatures will have problems... unless you mean we should parse 2x?
On Thu, Oct 16, 2014 at 12:12 PM, Maruan Sahyoun <[email protected]> wrote: > depends on the parser being used. NonSeq does follow the Xref information > and handles multiple EOFs (incremental updates) when parsing. > > BR > Maruan > > Am 16.10.2014 um 17:01 schrieb Brzrk One <[email protected]>: > > I've noticed that when there are multiple EOFs in the file, PDFBox parsing > is less reliable. > > > On Thu, Oct 16, 2014 at 9:44 AM, Vomlel Jan <[email protected]> wrote: > > When I use load insted of loadNoSeq, signatures are in this case valid. > > But for some documents load function doesnot read complete document. That > is why I used loadNoSeq. Some signatures are then missing. > > Viz: > http://leteckaposta.cz/831516385 > h1.pdf - original file (signature and timestamp) > h2.pdf - add first signature by pdfbox (timestamp is missing) > h3.pdf - add second signature by pdfbox (timestamp and previous signature > is missing) > > Jan > > -----Original Message----- > From: Maruan Sahyoun [mailto:[email protected]] > Sent: Thursday, October 16, 2014 2:37 PM > To: [email protected] > Subject: Re: problem with pdf eof > > when signing please make sure that you load the pdf using PDDocument.load > instead of PDDocument.loadNonSeq. > > > Am 16.10.2014 um 11:57 schrieb Vomlel Jan <[email protected]>: > > > > -----Original Message----- > From: Maruan Sahyoun [mailto:[email protected]] > Sent: Thursday, October 16, 2014 11:55 AM > To: [email protected] > Subject: Re: problem with pdf eof > > when you say invalid do you mean it’s corrupted or e.g. you get a > > warning sign in Adobe Reader? Would you have a sample PDF? > > > When you sign a document and sign it again the first signature points to > > a different document revision as you have changed the documents content > afterwards. So invalid in that context could mean that the warning you > might be getting is only reflecting that fact. Would need to see the > document to understand what’s going on. > > > BR > > Maruan > > Am 16.10.2014 um 11:48 schrieb Vomlel Jan <[email protected]>: > > Hi Maruan and others, > > I created signature and it seems OK. > But when I create second signature (loadNonSeq, addSignature, > > saveIncremental again), the first signature becomes invalid. > > I think that there can be problem, that first page is updated (signatur > > is invisible), but I dont understand it enough. > > > Jan > > -----Original Message----- > From: Maruan Sahyoun [mailto:[email protected]] > Sent: Monday, October 13, 2014 4:09 PM > To: [email protected] > Subject: Re: problem with pdf eof > > Hi Jan, > > there are sample in the examples package for various ways to sign a > > document [1]. Signing a document needs incremental saving. > > > OTOH choosing the right solution should not be made on the base if > > there is a license fee or not. > > > Maruan Sahyoun > > [1] > > > http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/signature/ > > > > Am 13.10.2014 um 16:02 schrieb Vomlel Jan <[email protected]>: > > Hi Maruan (and others), > > I would like to use pdfbox and bouncycastle for managing pdf > > signatures. Parsing, validation, timestamping (PADES LTV) . > > We used itext for it, but it is under commercial licence. > Parsing signatures seems to be working (thanks to your advice). So I > > will try to create timestamp. > > Is it possible with pdfbox? I found save method on PDDocument, but > > Iˇm afraid, that it can change bite representation of pdf, and signatures > become invalid. Is it true? What is right way to create signature or > timestamp with pdfbox? > > > Jan > > > -----Original Message----- > From: Maruan Sahyoun [mailto:[email protected]] > Sent: Friday, October 10, 2014 10:44 AM > To: [email protected] > Subject: Re: problem with pdf eof > > Hi Jan, > > choosing the right technology is very important so I do understand > > your concerns. I had to make such decision about using PDFBox in the past > too. > > It can > If you have specific issues I can answer I’m happy to try to do so. As > > a general statement PDFBox is used in production environments today (as an > example we ourselves are using it for a banking customer to process account > statements, an airline company to preprocess archiving documents and > various other customers). > > > PDFBox is continuously enhancing the parsing as we try to deal with > > real world PDF files which are not always inline with the the PDF > specification. Currently the best approach is to use PDDocument.loadNonSeq > (which parses documents according to the Xref information) and in case of > an exception PDDocument.load (which parses sequentially). The Apache Tika > project, which uses PDFBox for parsing PDF’s, is running the parsing and > text extraction against 50k PDFs being made available via > http://digitalcorpora.org > > > What is the application you would like to be using PDFBox for? Text > > Extraction, image conversion …. - I might be able to give you more specific > information for your use case. > > > BR > > Maruan > > Am 10.10.2014 um 10:10 schrieb Vomlel Jan <[email protected]>: > > Thank you Maruan, this function loads document. > > I have read https://pdfbox.apache.org/ideas.html "Replace/Enhance > > PDF parsing". I think correct parsing is very important, and I have some > doubts, if I can use pdfbox in production. Can you say something to rest me > :-). > > > Jan > > -----Original Message----- > From: Maruan Sahyoun [mailto:[email protected]] > Sent: Friday, October 10, 2014 9:25 AM > To: [email protected] > Subject: Re: problem with pdf eof > > Hi > > you can try PDDocument.loadNonSeq(InputStream is, null) > > BR > > Maruan > > Am 10.10.2014 um 09:09 schrieb Vomlel Jan <[email protected]>: > > Hello, > I use PDFBox 1.8.7 PDDocument.load(InputStream is) method to parse > > PDF document in attachement. > > Method return without exception, but document model is incomplete. > > Problem is in characters after EOF (ofset 22939): > startxref > 22449 > %%EOF > @ > 16 0 obj > << > /Type /Catalog > > PDFBox create internal IOException and ignore it with comment: > /* > * PDF files may have random data after the EOF > > marker. Ignore errors if > > * last object processed is EOF. > */ > > Is this PDF construction valid? > Which parser in PDFBox is correct? I tried ConformingPDParser, but > > another error occured. > > > Jan > > > > > Tento e-mail ani žádný z připojených souborů nejsou přijetím návrhu > > na uzavření smlouvy, ledaže je to v nich výslovně uvedeno. Pokud tomu tak > není, nelze je považovat za jednání, které by zakládalo jakékoliv nároky > vůči společnosti AiP Safe. Tento e-mail je určen pouze uvedenému příjemci a > dalším osobám, které jsou jmenovitě uvedeny jako příjemci, a jeho obsah, > včetně obsahu všech připojených souborů, je důvěrný. Jestliže nejste > oprávněný příjemce, zdržte se, prosím, jakékoliv formy zveřejnění, > reprodukce, kopírování, distribuce nebo šíření jeho obsahu, včetně obsahu > všech připojených souborů. Pokud jste obdržel tento e-mail omylem, oznamte > to, prosím, neprodleně jeho odesilateli a e-mail, včetně všech připojených > souborů, vymažte. Všechny e maily adresované, přijímané nebo posílané AiP > Safe s.r.o. nebo zaměstnanci AiP Safe s.r.o. jsou považovány za zásadně > pracovní e-maily. V souladu s tím odesilatel nebo příjemce těchto e mailů > souhlasí, že mohou být čteny jinými zaměstnanci AiP Safe s.r.o., než je > daný příjemce nebo odesilatel, proto aby byla zajištěna kontinuita > pracovních aktivit a byla umožněna jejich kontrola.. > > > > > > > > >

