After much digging here.... I found out that yes, in fact, PDFBox
handles embedded documents just fine (yay!).

As an example (not certain it's correct but it seems to work!, see the
patch on https://issues.apache.org/jira/browse/PDFBOX-1297 where I
fixed ExtractText to also extract text from embedded PDFs.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Apr 25, 2012 at 6:43 PM, Michael McCandless
<[email protected]> wrote:
> Does anyone know whether PDFBox is able to work with PDF Packages
> (where multiple PDFs are bound into one)?
>
> Here's Adobe's description of this feature:
>
>    
> http://help.adobe.com/en_US/Acrobat/8.0/Professional/help.html?content=WSE034CA46-D08F-4fff-AA3C-FF04510DAEF0.html
>
> I have an example of such a PDF (can't share unfortunately)... which
> ExtractText throws an IOException on ... but I'm not sure if 1) this
> particular PDF is corrumpt, or, 2) PDFBox doesn't understand PDF
> Packages.
>
> Any help appreciated!
>
> Thanks,
>
> Mike McCandless
>
> http://blog.mikemccandless.com

Reply via email to