Hi - Well it’s early POI stuff. Maybe a patch is possible for the narrow use case the Tika user has.
I assume that all you need is the first block or two to confirm this looks like an OLE document. Regards, Dave > On Apr 16, 2019, at 12:29 PM, Tim Allison <[email protected]> wrote: > > Thank you, Dave! The reading examples use POIFSReader, which I had hoped > was truly streaming, but it creates a POIFS, which requires a read/skip of > the entire stream IIUC, and then iterates...Or, am I missing something? > > I didn’t try POIFSReader by specifying a subdoc to process, but it looks > like it opens a POIFS first no matter how you register a listener. > > On Tue, Apr 16, 2019 at 3:20 PM Dave Fisher <[email protected]> wrote: > >> Hi Tim, >> >> Maybe the answer is using HPSF - >> >> https://poi.apache.org/components/hpsf/how-to.html >> >> Regards, >> Dave >> >>> On Apr 16, 2019, at 11:47 AM, Tim Allison <[email protected]> wrote: >>> >>> All, >>> In Tika, when we do file type detection of OLE files >>> (POIFSContainerDetector), we spool the file to disk, open a POIFS and >>> make a decision based on document/directory names. A user on >>> TIKA-2849 does not want to copy the full file from a slow network >>> drive for detection. When I tried using a BoundedInputStream with >>> POIFS, not surprisingly, I got EOF exceptions. >>> Question: is there any way to do detection in a streaming mode for >>> OLE files? Or, is this the best we can do? Thank you! >>> >>> Best, >>> >>> Tim >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
