On Mon, 1 Mar 2021, Peter Kronenberg wrote:
But the issue is that different parsers return the stream in different states. Sometimes the stream is all used up (although not closed). And other times, the stream has been re-set to the beginning where it can be re-used. Is this expected behavior?

If the Parser actually wants a File, and triggers the stream to be spooled to disk, then if you supplied a TikaInputStream the stream will still be sat at the start of the file. That's because it was read once, reset for use again, but the stream was then never touched, just the backing file used

If the Parser really wanted a stream, it will likely read most/all of the stream, so the stream should be positioned at the end (or perhaps close to it). Depending on how you constructed the stream, the stream class etc, it may or may not be rewindable / resettable for another subsequent read

Nick

Reply via email to