??Hi- I have a question regarding the limitation on entering getLength() for a second time while parsing a pdf.
I understand that it is possible to create a malicious pdf which which essentially goes into an infinite loop by having it parse nested streams that refer to each other. I do not believe this to be the case with these files (they are from well-known corporate book publishers). Obviously, pdfbox prohibits this nesting behavior by passing a boolean flag around and setting the inGetLength flag member variable when it first enters then clearing it upon exit. I have a several pdfs which open fine in Acrobat and Google Chrome (which is based on the pdfium engine), yet when I try to open them using pdfbox they throw the "Object must be defined and must not be compressed object" error. By observation, it seems to me that pdfium seems to get around this issue by keeping a counter of recursion depth (they use 64 max) and allowing essentially a short-depth nesting in this way, but they will throw an exception if this nesting gets too deep, thereby preventing those malicious pdfs from looping indefinitely. I have forked pdfbox up on Github and made a few minor changes to it that replace the boolean inGetLength flag with an integer counter and a constant max depth variable instead. This would allow pdfbox to continue to process an compressed stream provided the depth does no exceed the max depth. For all of the pdfs that were failing this test before, simply allowing a depth of 2 instead of 1 seemed to be enough to allow pdfbox to process the files without throwing the exception. If you would be so kind as to take a look at and comment on it if you would, I would be most appreciative. I am hoping that this tweak is ok. The intent is to continue to prevent malicious looping in the pdfs, but still allow shallow nesting to get through. https://github.com/santoch/pdfbox/pull/1 Please let me know what you think- Thanks- Steve

