Hi-

I have a question regarding the limitation on entering getLength() for a second 
time.

I understand that it is possible to create a malicious pdf which which 
essentially goes into an infinite loop by having it parse nested streams that 
refer to each other.  I do not believe this to be the case with these files 
(they are from well-known corporate book publishers).

Obviously, pdfbox prohibits this nesting behavior by passing Boolean flags 
around and setting the inGetLength flag when it first enters then clearing it 
upon exit.

I have a several pdfs which open fine in Acrobat and Google Chrome (which is 
based on the pdfium engine), yet when I try to open them using pdfbox they 
throw the "Object must be defined and must not be compressed object"  error. 

By observation, it seems to me that pdfium seems to get around this issue by 
keeping a counter of recursion depth (they use 64 max) and allowing essentially 
a short-depth nesting in this way, but throwing an exception if the nesting 
gets too deep.

I have forked pdfbox up on Github and made those minor changes.

This seems to allow me to open the few  that I'd like for you to take a look at 
and comment on it if you would.

https://github.com/santoch/pdfbox/pull/1

Please let me know what you think-
Thanks-
Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to