Hi,
On Tue, May 15, 2012 at 8:14 PM, Steve Deal <[email protected]> wrote:
> WWYD - What Would You Do?
I'd use the TikaInputStream [1] mechanism we added exactly for this purpose!
To do this, you'll need to adjust the PDF parser to do something like this:
TemporaryResources tmp = new TemporaryResources();
try {
TikaInputStream tis = TikaInputStream.get(stream, tmp);
doSomethingWith(tis.getFile());
} finally {
tmp.dispose();
}
Then, when calling the parser, simply pass a it a TikaInputStream
instance created based on a local file you have:
InputStream stream = TikaInputStream.get(file);
The Tika facade methods will automatically use TikaInputStream
whenever possible.
The nice thing about this mechanism is that the
TikaInputStream.getFile() method will automatically spool the input
stream to a temporary file and use that in case the original stream
was not constructed with TikaInputStream.get(File). (The
TemporaryResources instance is used to make sure that such temporary
files are properly cleaned up once no longer needed.)
If spooling a stream to a temporary file is more expensive than
parsing the document in sequential mode, you can also use the
TikaInputStream.hasFile() method to check whether a file is already
available, and switch the parsing mode to non-sequential only in such
cases.
[1] http://tika.apache.org/1.0/api/org/apache/tika/io/TikaInputStream.html
BR,
Jukka Zitting