Nick,

You're right - it works perfectly.  From reading the Tika code it appears that 
if an object pool contains a DocumentEntry called "Package" it is safe to 
assume it is an OOXML document which is embedded as is i.e. as a PKZip blob.
So to access it you just need to:

                if (name.equals("Package")) {
                    InputStream stream = new 
DocumentInputStream((DocumentEntry) entry);

                    try {
                        flushStreamToFile(stream, "/tmp/ooxml-file", 
((DocumentEntry) entry).getSize());

                    } finally {
                        stream.close();
                    }
                }

Thanks for your help!

- Chris

Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/

On 18 Jan 2015, at 18:36, Nick Burch <[email protected]> wrote:

> On Fri, 16 Jan 2015, Chris Bamford wrote:
>> A colleague who works with Windows has examined the file and determined that 
>> the embedded file lies in the marked "CompObj" entry. Does POI have an API 
>> for getting hold of it?
> 
> Your best example is probably in AbstractPOIFSExtractor from Apache Tika - 
> that has the exact code you need to read a CompObj entry from a given 
> directory within an OLE2 filesystem
> 
> Nick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 



Reply via email to