Hi Andreas, I appreciate the offer. After some more digging I have found that the assumption made by this code snippet (from Ole10Native.createFromEmbeddedOleObject) is not 100% reliable:
try { directory.getEntry("\u0001Ole10ItemName"); plain = true; } catch (FileNotFoundException ex) { plain = false; } What I have found is that with some documents that do not contain this entry (i.e. plain=false) are extractable if you set plain=true. So I have made the following (very similar) method to replace the call: private Ole10Native resilientCreateFromEmbeddedOleObject(DirectoryNode directory) throws IOException, Ole10NativeException { final String OLE10_NATIVE = "\u0001Ole10Native"; Ole10Native ole10 = null; boolean plain = false; boolean retry = false; try { directory.getEntry("\u0001Ole10ItemName"); plain = true; } catch (FileNotFoundException ex) { plain = false; } DocumentEntry nativeEntry = (DocumentEntry)directory.getEntry(OLE10_NATIVE); byte[] data = new byte[nativeEntry.getSize()]; directory.createDocumentInputStream(nativeEntry).read(data); // Have 2 goes at this - 'plain' can lie! try { ole10 = new Ole10Native(data, 0, plain); } catch (Ole10NativeException e) { retry = true; } if (retry) { ole10 = new Ole10Native(data, 0, !plain); } return ole10; } This gives a higher success rate. I will let you know what else I find :-) Kind regards, - Chris On 16 Jul 2014, at 23:00, Andreas Beeker <andreas.bee...@gmx.de<mailto:andreas.bee...@gmx.de>> wrote: Hi Chris, > On 16.07.2014 15:24, Chris Bamford wrote: > Looking in the source of Ole10Native at the offending line I see: > if (totalSize < ofs) { > throw new Ole10NativeException("Invalid Ole10Native"); > } > >Can anyone shed any light on what this means and why it happens? The MS docs [1] are quite limited on that stream, so the code is just plain guessing :| There are Ole10Native streams without an actually data part - i.e. (some) equation editor objects come without the data part, but encode somehow their data within the filename. But the Ole objects I looked at up so far, were common in having a label, a filename and a command or at least 3 length-prefixed byte-arrays. So this line checks if there was a error with the length-prefixes. If you can share your file, please open a bug entry or alternatively send it to my private email. I would then try to figure out, how the bin object could be handled. Currently I don't have much time and my priority is to finish that xml signature stuff, so that may take some time ... sorry Andi. [1] http://msdn.microsoft.com/en-us/library/dd942447.aspx<http://msdn.microsoft.com/en-us/library/dd942447.aspx> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@poi.apache.org<mailto:user-unsubscr...@poi.apache.org> For additional commands, e-mail: user-h...@poi.apache.org<mailto:user-h...@poi.apache.org> Chris Bamford Senior Developer m: +44 7860 405292 p: +44 207 847 8700 w: www.mimecast.com Address click here: www.mimecast.com/About-us/Contact-us/