Plan C: if you’re willing to store a mirror set of directories with the text versions of the files, just run tika-app.jar on your “input” directory and run your SolrJ loader on the “text/export” directory:
java -jar tika-app.jar <input> <output> And, if you’re feeling jsonic: java -jar tika-app.jar –J -t –i <input> -o <output> This method of running Tika will be robust to OOM, permanent hangs and OS-destroying-your-process-out-of-self-preservation incidents. From: Steven White [mailto:[email protected]] Sent: Thursday, February 11, 2016 10:18 AM To: [email protected] Subject: Re: Using tika-app-1.11.jar Thank you Nick and everyone who has helped me with my questions. I'm now understand Tika much better vs. where I was at last week when I first looked at it. Steve On Thu, Feb 11, 2016 at 8:18 AM, Nick Burch <[email protected]<mailto:[email protected]>> wrote: On Wed, 10 Feb 2016, Steven White wrote: I'm including tika-app-1.11.jar with my application and see that Tika includes "slf4j". The Tika App single jar is intended for standalone use. It's not generally recommended to be included as part of a wider application, as it tends to include everything and the kitchen sink, to allow for easy standalone use Generally, you should just tell Maven / Groovy / Ivy that you want to depend on Tika Core + Tika Parsers, then your build tool will fetch + bundle all the dependencies for you. That lets you have proper control over conflicting versions of jars etc Nick
