Plan C: if you’re willing to store a mirror set of directories with the text versions of the files, just run tika-app.jar on your “input” directory and run your SolrJ loader on the “text/export” directory:
java -jar tika-app.jar <input> <output> And, if you’re feeling jsonic: java -jar tika-app.jar –J -t –i <input> -o <output> This method of running Tika will be robust to OOM, permanent hangs and OS-destroying-your-process-out-of-self-preservation incidents. From: Steven White [mailto:swhite4...@gmail.com] Sent: Thursday, February 11, 2016 10:18 AM To: user@tika.apache.org Subject: Re: Using tika-app-1.11.jar Thank you Nick and everyone who has helped me with my questions. I'm now understand Tika much better vs. where I was at last week when I first looked at it. Steve On Thu, Feb 11, 2016 at 8:18 AM, Nick Burch <apa...@gagravarr.org<mailto:apa...@gagravarr.org>> wrote: On Wed, 10 Feb 2016, Steven White wrote: I'm including tika-app-1.11.jar with my application and see that Tika includes "slf4j". The Tika App single jar is intended for standalone use. It's not generally recommended to be included as part of a wider application, as it tends to include everything and the kitchen sink, to allow for easy standalone use Generally, you should just tell Maven / Groovy / Ivy that you want to depend on Tika Core + Tika Parsers, then your build tool will fetch + bundle all the dependencies for you. That lets you have proper control over conflicting versions of jars etc Nick