Nick,
You're right - it works perfectly. From reading the Tika code it appears that
if an object pool contains a DocumentEntry called "Package" it is safe to
assume it is an OOXML document which is embedded as is i.e. as a PKZip blob.
So to access it you just need to:
if (name.equals("Package")) {
InputStream stream = new
DocumentInputStream((DocumentEntry) entry);
try {
flushStreamToFile(stream, "/tmp/ooxml-file",
((DocumentEntry) entry).getSize());
} finally {
stream.close();
}
}
Thanks for your help!
- Chris
Chris Bamford
Senior Developer
m: +44 7860 405292
p: +44 207 847 8700
w: www.mimecast.com
Address click here: www.mimecast.com/About-us/Contact-us/
On 18 Jan 2015, at 18:36, Nick Burch <[email protected]> wrote:
> On Fri, 16 Jan 2015, Chris Bamford wrote:
>> A colleague who works with Windows has examined the file and determined that
>> the embedded file lies in the marked "CompObj" entry. Does POI have an API
>> for getting hold of it?
>
> Your best example is probably in AbstractPOIFSExtractor from Apache Tika -
> that has the exact code you need to read a CompObj entry from a given
> directory within an OLE2 filesystem
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>