I've noticed that when there are multiple EOFs in the file, PDFBox parsing is less reliable.
On Thu, Oct 16, 2014 at 9:44 AM, Vomlel Jan <[email protected]> wrote: > When I use load insted of loadNoSeq, signatures are in this case valid. > > But for some documents load function doesnot read complete document. That > is why I used loadNoSeq. Some signatures are then missing. > > Viz: > http://leteckaposta.cz/831516385 > h1.pdf - original file (signature and timestamp) > h2.pdf - add first signature by pdfbox (timestamp is missing) > h3.pdf - add second signature by pdfbox (timestamp and previous signature > is missing) > > Jan > > -----Original Message----- > From: Maruan Sahyoun [mailto:[email protected]] > Sent: Thursday, October 16, 2014 2:37 PM > To: [email protected] > Subject: Re: problem with pdf eof > > when signing please make sure that you load the pdf using PDDocument.load > instead of PDDocument.loadNonSeq. > > > Am 16.10.2014 um 11:57 schrieb Vomlel Jan <[email protected]>: > > > > > > > -----Original Message----- > > From: Maruan Sahyoun [mailto:[email protected]] > > Sent: Thursday, October 16, 2014 11:55 AM > > To: [email protected] > > Subject: Re: problem with pdf eof > > > > when you say invalid do you mean it’s corrupted or e.g. you get a > warning sign in Adobe Reader? Would you have a sample PDF? > > > > When you sign a document and sign it again the first signature points to > a different document revision as you have changed the documents content > afterwards. So invalid in that context could mean that the warning you > might be getting is only reflecting that fact. Would need to see the > document to understand what’s going on. > > > > BR > > > > Maruan > > > > Am 16.10.2014 um 11:48 schrieb Vomlel Jan <[email protected]>: > > > >> Hi Maruan and others, > >> > >> I created signature and it seems OK. > >> But when I create second signature (loadNonSeq, addSignature, > saveIncremental again), the first signature becomes invalid. > >> I think that there can be problem, that first page is updated (signatur > is invisible), but I dont understand it enough. > >> > >> Jan > >> > >> -----Original Message----- > >> From: Maruan Sahyoun [mailto:[email protected]] > >> Sent: Monday, October 13, 2014 4:09 PM > >> To: [email protected] > >> Subject: Re: problem with pdf eof > >> > >> Hi Jan, > >> > >> there are sample in the examples package for various ways to sign a > document [1]. Signing a document needs incremental saving. > >> > >> OTOH choosing the right solution should not be made on the base if > there is a license fee or not. > >> > >> Maruan Sahyoun > >> > >> [1] > http://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/signature/ > >> > >> > >> Am 13.10.2014 um 16:02 schrieb Vomlel Jan <[email protected]>: > >> > >>> Hi Maruan (and others), > >>> > >>> I would like to use pdfbox and bouncycastle for managing pdf > signatures. Parsing, validation, timestamping (PADES LTV) . > >>> We used itext for it, but it is under commercial licence. > >>> Parsing signatures seems to be working (thanks to your advice). So I > will try to create timestamp. > >>> Is it possible with pdfbox? I found save method on PDDocument, but > Iˇm afraid, that it can change bite representation of pdf, and signatures > become invalid. Is it true? What is right way to create signature or > timestamp with pdfbox? > >>> > >>> Jan > >>> > >>> > >>> -----Original Message----- > >>> From: Maruan Sahyoun [mailto:[email protected]] > >>> Sent: Friday, October 10, 2014 10:44 AM > >>> To: [email protected] > >>> Subject: Re: problem with pdf eof > >>> > >>> Hi Jan, > >>> > >>> choosing the right technology is very important so I do understand > your concerns. I had to make such decision about using PDFBox in the past > too. > >>> It can > >>> If you have specific issues I can answer I’m happy to try to do so. As > a general statement PDFBox is used in production environments today (as an > example we ourselves are using it for a banking customer to process account > statements, an airline company to preprocess archiving documents and > various other customers). > >>> > >>> PDFBox is continuously enhancing the parsing as we try to deal with > real world PDF files which are not always inline with the the PDF > specification. Currently the best approach is to use PDDocument.loadNonSeq > (which parses documents according to the Xref information) and in case of > an exception PDDocument.load (which parses sequentially). The Apache Tika > project, which uses PDFBox for parsing PDF’s, is running the parsing and > text extraction against 50k PDFs being made available via > http://digitalcorpora.org > >>> > >>> What is the application you would like to be using PDFBox for? Text > Extraction, image conversion …. - I might be able to give you more specific > information for your use case. > >>> > >>> BR > >>> > >>> Maruan > >>> > >>> Am 10.10.2014 um 10:10 schrieb Vomlel Jan <[email protected]>: > >>> > >>>> Thank you Maruan, this function loads document. > >>>> > >>>> I have read https://pdfbox.apache.org/ideas.html "Replace/Enhance > PDF parsing". I think correct parsing is very important, and I have some > doubts, if I can use pdfbox in production. Can you say something to rest me > :-). > >>>> > >>>> Jan > >>>> > >>>> -----Original Message----- > >>>> From: Maruan Sahyoun [mailto:[email protected]] > >>>> Sent: Friday, October 10, 2014 9:25 AM > >>>> To: [email protected] > >>>> Subject: Re: problem with pdf eof > >>>> > >>>> Hi > >>>> > >>>> you can try PDDocument.loadNonSeq(InputStream is, null) > >>>> > >>>> BR > >>>> > >>>> Maruan > >>>> > >>>> Am 10.10.2014 um 09:09 schrieb Vomlel Jan <[email protected]>: > >>>> > >>>>> Hello, > >>>>> I use PDFBox 1.8.7 PDDocument.load(InputStream is) method to parse > PDF document in attachement. > >>>>> Method return without exception, but document model is incomplete. > >>>>> > >>>>> Problem is in characters after EOF (ofset 22939): > >>>>> startxref > >>>>> 22449 > >>>>> %%EOF > >>>>> @ > >>>>> 16 0 obj > >>>>> << > >>>>> /Type /Catalog > >>>>> > >>>>> PDFBox create internal IOException and ignore it with comment: > >>>>> /* > >>>>> * PDF files may have random data after the EOF > marker. Ignore errors if > >>>>> * last object processed is EOF. > >>>>> */ > >>>>> > >>>>> Is this PDF construction valid? > >>>>> Which parser in PDFBox is correct? I tried ConformingPDParser, but > another error occured. > >>>>> > >>>>> Jan > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Tento e-mail ani žádný z připojených souborů nejsou přijetím návrhu > na uzavření smlouvy, ledaže je to v nich výslovně uvedeno. Pokud tomu tak > není, nelze je považovat za jednání, které by zakládalo jakékoliv nároky > vůči společnosti AiP Safe. Tento e-mail je určen pouze uvedenému příjemci a > dalším osobám, které jsou jmenovitě uvedeny jako příjemci, a jeho obsah, > včetně obsahu všech připojených souborů, je důvěrný. Jestliže nejste > oprávněný příjemce, zdržte se, prosím, jakékoliv formy zveřejnění, > reprodukce, kopírování, distribuce nebo šíření jeho obsahu, včetně obsahu > všech připojených souborů. Pokud jste obdržel tento e-mail omylem, oznamte > to, prosím, neprodleně jeho odesilateli a e-mail, včetně všech připojených > souborů, vymažte. Všechny e maily adresované, přijímané nebo posílané AiP > Safe s.r.o. nebo zaměstnanci AiP Safe s.r.o. jsou považovány za zásadně > pracovní e-maily. V souladu s tím odesilatel nebo příjemce těchto e mailů > souhlasí, že mohou být čteny jinými zaměstnanci AiP Safe s.r.o., než je > daný příjemce nebo odesilatel, proto aby byla zajištěna kontinuita > pracovních aktivit a byla umožněna jejich kontrola.. > >>>> > >>> > >> > > > >

