Re: PDFParser Conflict Resolution

Maruan Sahyoun Sat, 22 Feb 2014 08:24:06 -0800

Hi,

the PDFParser works sequentially throughout the file from top to bottom and 
collects all objects. Conflict resolution is done by making the assumption that 
if an object with the same number exists later in the file that this should be 
the correct one.


NonSequentialParser works through the file by looking at the Xref information 
(table or stream). This is inline with the PDF specification.

So patching as you’ve done might resolve your issue but might also introduce 
issues with other files. The best way would be to find out why 
NonSequentialParser has issues parsing your file. If you think it’s a bug 
please open an issue in jira [https://issues.apache.org/jira/browse/PDFBOX] and 
attach the PDF file to together with some sample code.

BR
Maruan Sahyoun

Am 21.02.2014 um 23:47 schrieb Cary L. Schofield <[email protected]>:

> I have a signed document that is getting parsed incorrectly.
> 
> Using PDFParser the document form is missing all fields and I can't get to 
> the signature fields.
> Using NonSequentialPDFParser I can get to the signature fields but the signed 
> data appears to have been corrupted.
> 
> I was able to determine that the form was being replaced or corrupted during 
> conflict resolution.
> 
> I solved the problem by patching PDFParser.ConflictObj to ignore an object in 
> the conflict list when the existing object (from the object pool) is a direct 
> object.
> 
> I know I should do the research, but was hoping someone would already know if 
> the patch is reasonable or likely to cause more/other problems.
> 
> Thanks
>

Re: PDFParser Conflict Resolution

Reply via email to