Re: Fwd: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-13 Thread Nick Burch
On Tue, 13 Oct 2020, Tim Allison wrote: Ha, y, this file exercises those bits of code: https://github.com/apache/tika/blob/main/tika-parser-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testPPT_oleWorkbook.ppt Nick, does this match the features of the SO question? Yup,

Re: Fwd: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-13 Thread Tim Allison
Ha, y, this file exercises those bits of code: https://github.com/apache/tika/blob/main/tika-parser-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testPPT_oleWorkbook.ppt Nick, does this match the features of the SO question? On Tue, Oct 13, 2020 at 10:58 AM Tim Allison w

Re: Fwd: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-13 Thread Tim Allison
Based on https://github.com/apache/tika/blob/main/tika-parser-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java#L518 and https://github.com/apache/tika/blob/main/tika-parser-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/pa

Re: Fwd: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-13 Thread Tim Allison
Thank you, Nick! IIUC the XLSX raw bytes are in the Package entry of an OLE2 wrapper. What is the key for the OLE2 wrapper in the PPT? Sorry for missing this... Have you put your hands on an example that you could share privately? Happy to look through our regression corpus if I know what exact

Re: Fwd: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-10 Thread Nick Burch
On Fri, 9 Oct 2020, Tim Allison wrote: Do you think we should follow up on the Tika side? Do we know if we can handle this? I thought we did, but checking POIFSContainerDetector I can't actually see that case covered I think we (Tika) can handle it in a similar way to CompObj Over on

Fwd: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it?

2020-10-09 Thread Tim Allison
Nick, Do you think we should follow up on the Tika side? Do we know if we can handle this? -- Forwarded message - From: Nick Burch Date: Fri, Oct 9, 2020 at 4:43 PM Subject: XLSX wrapped in an OLE2 CompObj/Package - should WorkbookFactory handle it? To: Hi All Over on Stack