Re: Re-using a TikaStream

2021-03-01 Thread Nick Burch
On Mon, 1 Mar 2021, Tim Allison wrote: detectors should return the stream reset to the beginning. I agree - needs to be ready for the parser to then process Parsers, IIRC, should return the stream fully(?) read but not closed. Not always - if the parser wanted a File then it may not have

RE: Re-using a TikaStream

2021-03-01 Thread Peter Kronenberg
That’s not what I’m seeing. The AudioParser returns the stream at the beginning. Maybe it’s because there was nothing to parse. It just returns metadata. But the MP4Parser returns the stream fully consumed, even though, again, it only returns meta-data. Since right now, I’m dealing with

RE: Re-using a TikaStream

2021-03-01 Thread Nick Burch
On Fri, 26 Feb 2021, Peter Kronenberg wrote: For most audio files, using the AudioParser, the buffer is still at the beginning. Even though there is no text extraction, I would think that Tika still needs to read through the stream. The MP3Parser consumes the stream, but the MP4Parser does

Re: Re-using a TikaStream

2021-03-01 Thread Tim Allison
detectors should return the stream reset to the beginning. Parsers, IIRC, should return the stream fully(?) read but not closed. On Mon, Mar 1, 2021 at 10:29 AM Tim Allison wrote: > Reusing streams after parsing hasn't been something I've done before... > > This is not expected behavior.

Re: Re-using a TikaStream

2021-03-01 Thread Tim Allison
Reusing streams after parsing hasn't been something I've done before... This is not expected behavior. Parsers should all behave the same. On Mon, Mar 1, 2021 at 10:24 AM Peter Kronenberg wrote: > After more testing, it seems that it has nothing to do with > TikaInputStream. I just passed in

RE: Re-using a TikaStream

2021-03-01 Thread Peter Kronenberg
After more testing, it seems that it has nothing to do with TikaInputStream. I just passed in a BufferedInputStream to the parsers. I see that the first thing the AutoDetactParser does is to convert it to a TikaInputStream. So maybe TIS is being leveraged at a lower level, but there no