Something like: class MyInputStreamFactory implements InputStreamFactory{
private File file; public MyInputStreamFactory(File file){ this.file = file; } public InputStream getInputStream(){ return new FileInputStream(file); } } in your client code: Parser parser = new AutoDetectParser(); TikaInputStream tis = TikaInputStream.get(new MyInputStreamFactory(file)); parser.parse(tis, new ToTextContentHandler(), new Metadata(), new ParseContext()); when you need to reuse the stream (into your parser): public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException { //(...) TikaInputStream tis = TikaInputStream.get(stream); if(tis.hasInputStreamFactory()){ try(InputStream is = tis.getInputStreamFactory().getInputStream()){ //consume the new stream } }else throw new IOException("not a reusable inputStream"); } Of course this is useful if you are not processing files, e.g. reading files from the cloud or sockets. Regards, Luis Em seg., 22 de fev. de 2021 às 19:18, Peter Kronenberg < peter.kronenb...@torch.ai> escreveu: > I sent this question late on Friday. Sending it again. Can you provide a > little more information how out to use the InputStreamFactory? > > > > *From:* Peter Kronenberg <peter.kronenb...@torch.ai> > *Sent:* Friday, February 19, 2021 5:10 PM > *To:* user@tika.apache.org; lfcnas...@gmail.com > *Subject:* RE: Re-using a TikaStream > > > > This email was sent from outside your organisation, yet is displaying the > name of someone from your organisation. This often happens in phishing > attempts. Please only interact with this email if you know its source and > that the content is safe. > > > > There appear to be 2 InputStreamFactory classes: in tika-server-core and > tika-io. The one in server.core is the only one with a concrete class. > > I’m not quite sure I see how to use this. > > Normally, I create a TikaInputStream with > TikaInputStream.get(InputStream). How do I create it from an > InputStreamFactory? > > TikaInputStream.getInputStreamFactory() only returns a factory if the > TikaInputStream was created from a factory. > > Is there a good example of how this is used > > > > *From:* Peter Kronenberg <peter.kronenb...@torch.ai> > *Sent:* Friday, February 19, 2021 4:57 PM > *To:* user@tika.apache.org; lfcnas...@gmail.com > *Subject:* RE: Re-using a TikaStream > > > > This email was sent from outside your organisation, yet is displaying the > name of someone from your organisation. This often happens in phishing > attempts. Please only interact with this email if you know its source and > that the content is safe. > > > > Thanks. I thought that TikaInputStream already automatically saved to > disk to allow re-reading. > > > > *From:* Luís Filipe Nassif <lfcnas...@gmail.com> > *Sent:* Friday, February 19, 2021 3:44 PM > *To:* user@tika.apache.org > *Subject:* Re: Re-using a TikaStream > > > > You could call TikaInputStream.getPath() at the beginning of your parser, > it will spool to file if not file based. After consuming the original > inputStream, create a new one from the temp file created. > > > > If you are using 2.0.0-ALPHA, there is: > > > > > https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/io/InputStreamFactory.java > > > > Use with the new methods from TikaInputStream: > > public static TikaInputStream get(InputStreamFactory factory) > > public InputStreamFactory getInputStreamFactory() > > > > Hope this helps, > > Luis > > > > Em sex., 19 de fev. de 2021 às 16:09, Peter Kronenberg < > peter.kronenb...@torch.ai> escreveu: > > If I finish parsing a TikaStream, can I re-use the stream (before it is > closed)? I know you said that there is some magic behind the scenes where > it spools it to a file. Can I just call reset() to start from the > beginning? > > > > Peter > > > > > > *Peter Kronenberg* *| * *Senior AI Analytic ENGINEER * > > *C: 703.887.5623* > > [image: Torch AI] <http://www.torch.ai/> > > 4303 W. 119th St., Leawood, KS 66209 > WWW.TORCH.AI <http://www.torch.ai/> > > > > > >