Something like:

class MyInputStreamFactory implements InputStreamFactory{

    private File file;

    public  MyInputStreamFactory(File file){
        this.file = file;
    }

    public InputStream getInputStream(){
        return new FileInputStream(file);
    }
}

in your client code:

Parser parser = new AutoDetectParser();
TikaInputStream tis =  TikaInputStream.get(new MyInputStreamFactory(file));
parser.parse(tis, new ToTextContentHandler(), new Metadata(), new
ParseContext());

when you need to reuse the stream (into your parser):

public void parse(InputStream stream, ContentHandler handler, Metadata
metadata, ParseContext context)
            throws IOException, SAXException, TikaException {
   //(...)
   TikaInputStream tis = TikaInputStream.get(stream);
   if(tis.hasInputStreamFactory()){
        try(InputStream is = tis.getInputStreamFactory().getInputStream()){
              //consume the new stream
        }
   }else
       throw new IOException("not a reusable inputStream");
 }

Of course this is useful if you are not processing files, e.g. reading
files from the cloud or sockets.

Regards,
Luis


Em seg., 22 de fev. de 2021 às 19:18, Peter Kronenberg <
peter.kronenb...@torch.ai> escreveu:

> I sent this question late on Friday.  Sending it again.  Can you provide a
> little more information how out to use the InputStreamFactory?
>
>
>
> *From:* Peter Kronenberg <peter.kronenb...@torch.ai>
> *Sent:* Friday, February 19, 2021 5:10 PM
> *To:* user@tika.apache.org; lfcnas...@gmail.com
> *Subject:* RE: Re-using a TikaStream
>
>
>
> This email was sent from outside your organisation, yet is displaying the
> name of someone from your organisation. This often happens in phishing
> attempts. Please only interact with this email if you know its source and
> that the content is safe.
>
>
>
> There appear to be 2 InputStreamFactory classes: in tika-server-core and
> tika-io.  The one in server.core is the only one with a concrete class.
>
> I’m not quite sure I see how to use this.
>
> Normally, I create a TikaInputStream with
> TikaInputStream.get(InputStream).  How do I create it from an
> InputStreamFactory?
>
> TikaInputStream.getInputStreamFactory() only returns a factory if the
> TikaInputStream was created from a factory.
>
> Is there a good example of how this is used
>
>
>
> *From:* Peter Kronenberg <peter.kronenb...@torch.ai>
> *Sent:* Friday, February 19, 2021 4:57 PM
> *To:* user@tika.apache.org; lfcnas...@gmail.com
> *Subject:* RE: Re-using a TikaStream
>
>
>
> This email was sent from outside your organisation, yet is displaying the
> name of someone from your organisation. This often happens in phishing
> attempts. Please only interact with this email if you know its source and
> that the content is safe.
>
>
>
> Thanks.  I thought that TikaInputStream already automatically saved to
> disk to allow re-reading.
>
>
>
> *From:* Luís Filipe Nassif <lfcnas...@gmail.com>
> *Sent:* Friday, February 19, 2021 3:44 PM
> *To:* user@tika.apache.org
> *Subject:* Re: Re-using a TikaStream
>
>
>
> You could call TikaInputStream.getPath() at the beginning of your parser,
> it will spool to file if not file based. After consuming the original
> inputStream, create a new one from the temp file created.
>
>
>
> If you are using 2.0.0-ALPHA, there is:
>
>
>
>
> https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/io/InputStreamFactory.java
>
>
>
> Use with the new methods from TikaInputStream:
>
> public static TikaInputStream get(InputStreamFactory factory)
>
> public InputStreamFactory getInputStreamFactory()
>
>
>
> Hope this helps,
>
> Luis
>
>
>
> Em sex., 19 de fev. de 2021 às 16:09, Peter Kronenberg <
> peter.kronenb...@torch.ai> escreveu:
>
> If I finish parsing a TikaStream, can I re-use the stream (before it is
> closed)?  I know you said that there is some magic behind the scenes where
> it spools it to a file.  Can I just call reset() to start from the
> beginning?
>
>
>
> Peter
>
>
>
>
>
> *Peter Kronenberg*  *| * *Senior AI Analytic ENGINEER *
>
> *C: 703.887.5623*
>
> [image: Torch AI] <http://www.torch.ai/>
>
> 4303 W. 119th St., Leawood, KS 66209
> WWW.TORCH.AI <http://www.torch.ai/>
>
>
>
>
>
>

Reply via email to