After much digging here.... I found out that yes, in fact, PDFBox handles embedded documents just fine (yay!).
As an example (not certain it's correct but it seems to work!, see the patch on https://issues.apache.org/jira/browse/PDFBOX-1297 where I fixed ExtractText to also extract text from embedded PDFs. Mike McCandless http://blog.mikemccandless.com On Wed, Apr 25, 2012 at 6:43 PM, Michael McCandless <[email protected]> wrote: > Does anyone know whether PDFBox is able to work with PDF Packages > (where multiple PDFs are bound into one)? > > Here's Adobe's description of this feature: > > > http://help.adobe.com/en_US/Acrobat/8.0/Professional/help.html?content=WSE034CA46-D08F-4fff-AA3C-FF04510DAEF0.html > > I have an example of such a PDF (can't share unfortunately)... which > ExtractText throws an IOException on ... but I'm not sure if 1) this > particular PDF is corrumpt, or, 2) PDFBox doesn't understand PDF > Packages. > > Any help appreciated! > > Thanks, > > Mike McCandless > > http://blog.mikemccandless.com

