RE: Re-using a TikaStream

2021-02-26 Thread Peter Kronenberg
So is this guaranteed, expected behavior? With a BufferedInputStream – I expect this try (BufferedInputStream stream = new BufferedInputStream(new FileInputStream(file))) { System.out.printf("before - bytes available: %s", stream.available()); parser.parse(stream, handler, metadata, par

RE: Re-using a TikaStream

2021-02-26 Thread Peter Kronenberg
I think I figured this out. It seems to depend on what parser is used. Not sure if this just has to do with inconsistent implementations, or there is some reason behind it. For most audio files, using the AudioParser, the buffer is still at the beginning. Even though there is no text extract

Re: Re-using a TikaStream

2021-02-26 Thread Tim Allison
The stream.available() call comes from ProxyInputStream. We don't modify that in TikaInputStream...maybe we should. TikaInputStream wraps an incoming InputStream in a BufferedInputStream if it doesn't supportMark already. So, as long as you're happy with the performance and potential limitations

RE: Re-using a TikaStream

2021-02-26 Thread Peter Kronenberg
But as I said, this doesn’t seem to work with all parsers.So let’s say I pass in an MP4 file which uses the MP4Parser and then I want to re-use the stream afterward. How can I guarantee consistent beahvor, no matter which paser gets used? From: Tim Allison Sent: Friday, February 26, 2021