Re: Extract PDF inline images

2015-07-07 Thread Andrea Asta
Hi Tim, thanks for your response, but I can't find a complete solution. I've created a class using the same FileEmbeddedDocumentExtractor from TikaCLI, and now I'm trying to do a sample main program with a PDF containing some images. This is my code, but I can't have any image stored and the

RE: Extract PDF inline images

2015-07-07 Thread Allison, Timothy B.
document. Stay tuned to TIKA-1674 for follow up on that. Best, Tim From: Andrea Asta [mailto:asta.and...@gmail.com] Sent: Tuesday, July 07, 2015 6:22 AM To: user@tika.apache.org Subject: Re: Extract PDF inline images Hi Tim, thanks for your response, but I can't

RE: Extract PDF inline images

2015-07-06 Thread Allison, Timothy B.
Hi Andrea, The RecursiveParserWrapper, as you found, is only for extracted content and metadata. It was designed to cache metadata and content from embedded documents so that you can easily keep those two things together for each embedded document. To extract the raw bytes from embedded