I sent this question late on Friday.  Sending it again.  Can you provide a 
little more information how out to use the InputStreamFactory?

From: Peter Kronenberg <peter.kronenb...@torch.ai>
Sent: Friday, February 19, 2021 5:10 PM
To: user@tika.apache.org; lfcnas...@gmail.com
Subject: RE: Re-using a TikaStream

This email was sent from outside your organisation, yet is displaying the name 
of someone from your organisation. This often happens in phishing attempts. 
Please only interact with this email if you know its source and that the 
content is safe.

There appear to be 2 InputStreamFactory classes: in tika-server-core and 
tika-io.  The one in server.core is the only one with a concrete class.
I’m not quite sure I see how to use this.
Normally, I create a TikaInputStream with TikaInputStream.get(InputStream).  
How do I create it from an InputStreamFactory?
TikaInputStream.getInputStreamFactory() only returns a factory if the 
TikaInputStream was created from a factory.
Is there a good example of how this is used

From: Peter Kronenberg 
<peter.kronenb...@torch.ai<mailto:peter.kronenb...@torch.ai>>
Sent: Friday, February 19, 2021 4:57 PM
To: user@tika.apache.org<mailto:user@tika.apache.org>; 
lfcnas...@gmail.com<mailto:lfcnas...@gmail.com>
Subject: RE: Re-using a TikaStream

This email was sent from outside your organisation, yet is displaying the name 
of someone from your organisation. This often happens in phishing attempts. 
Please only interact with this email if you know its source and that the 
content is safe.

Thanks.  I thought that TikaInputStream already automatically saved to disk to 
allow re-reading.

From: Luís Filipe Nassif <lfcnas...@gmail.com<mailto:lfcnas...@gmail.com>>
Sent: Friday, February 19, 2021 3:44 PM
To: user@tika.apache.org<mailto:user@tika.apache.org>
Subject: Re: Re-using a TikaStream

You could call TikaInputStream.getPath() at the beginning of your parser, it 
will spool to file if not file based. After consuming the original inputStream, 
create a new one from the temp file created.

If you are using 2.0.0-ALPHA, there is:

https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/io/InputStreamFactory.java

Use with the new methods from TikaInputStream:
public static TikaInputStream get(InputStreamFactory factory)
public InputStreamFactory getInputStreamFactory()

Hope this helps,
Luis

Em sex., 19 de fev. de 2021 às 16:09, Peter Kronenberg 
<peter.kronenb...@torch.ai<mailto:peter.kronenb...@torch.ai>> escreveu:
If I finish parsing a TikaStream, can I re-use the stream (before it is 
closed)?  I know you said that there is some magic behind the scenes where it 
spools it to a file.  Can I just call reset() to start from the beginning?

Peter


Peter Kronenberg  |  Senior AI Analytic ENGINEER
C: 703.887.5623
[Torch AI]<http://www.torch.ai/>
4303 W. 119th St., Leawood, KS 66209
WWW.TORCH.AI<http://www.torch.ai/>


Reply via email to