On 1/16/2012 4:24 AM, Nick Burch wrote:
On Fri, 13 Jan 2012, P. Hill wrote:
Anyone know about the (future?) ability of Tika to parse PDF Portfolio Files? http://help.adobe.com/en_US/Acrobat/9.0/Standard/WSA2872EA8-9756-4a8c-9F20-8E93D59D91CE.html

My hunch is that this'll need some PDFBox support too, to let us at the original files, and to let us know what parts are a portfolio.

As a first step, I'd suggest you ask on the PDFBox list about their support for Portfolio files

Nick

Nick,

I finally got a moment to ask about PDF Portfolio files and the folks over at PDFBox directed me to:
http://pdfbox.apache.org/userguide/file_references.html

I pass that along for Tika developers, but it seems there might be some issues about combining all the content in a portfolio not unlike e-mails with attachments or other compound documents (http://wiki.apache.org/tika/MetadataDiscussion).

I can report my company has seen a least one end user using Portfolio files, but they don't seem very common.

-Paul

Reply via email to