Thank you, Dave! The reading examples use POIFSReader, which I had hoped was truly streaming, but it creates a POIFS, which requires a read/skip of the entire stream IIUC, and then iterates...Or, am I missing something?
I didn’t try POIFSReader by specifying a subdoc to process, but it looks like it opens a POIFS first no matter how you register a listener. On Tue, Apr 16, 2019 at 3:20 PM Dave Fisher <[email protected]> wrote: > Hi Tim, > > Maybe the answer is using HPSF - > > https://poi.apache.org/components/hpsf/how-to.html > > Regards, > Dave > > > On Apr 16, 2019, at 11:47 AM, Tim Allison <[email protected]> wrote: > > > > All, > > In Tika, when we do file type detection of OLE files > > (POIFSContainerDetector), we spool the file to disk, open a POIFS and > > make a decision based on document/directory names. A user on > > TIKA-2849 does not want to copy the full file from a slow network > > drive for detection. When I tried using a BoundedInputStream with > > POIFS, not surprisingly, I got EOF exceptions. > > Question: is there any way to do detection in a streaming mode for > > OLE files? Or, is this the best we can do? Thank you! > > > > Best, > > > > Tim > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
