Another option is to load and invoke Tika in its own classloader to keep
its jars isolated from the rest of the application. We did this for a
while, until we switched to Gradle and implemented the "careful
exclusion" approach that Ken mentioned. Downside was the need to use
reflection to
When we used Tika as a library with Hadoop map-reduce workflows, we had to run
it in a separate thread with a timeout, and leave the thread as a zombie
if/when it hung.
As far as jar hell (a very real problem), you can either do careful exclusions
in your dependency specification (painful, and
Not totally on topic but I think related to this thread. I'm currently
exploring using tika as a library in Apache Spark. This approach suffers
the same problems as using Tika as library mentioned above. Has anyone used
Tika as a library in a Spark Job? Or would it still make sense to us
something
We have been using tika as java library, for a few years now and parsing
millions of different files each day. And we're switching now to tika
server as bugs in different tika components (dependencies) caused issue
like exit of the jvm, memory issues and so. Also, tika and it's different
Hi Robert,
in the sense of a microservice architecture it makes absolute sense to
use Tika as a server/microservice component. As Tim Allison explained
this helps you to separate your business requirements in isolated
components (running in there own JVM).
If you don't need to link the Tika
Hi Robert,
Thank you for the note. You can call Tika programmatically if you'd like
with Java. Some examples are available here:
https://tika.apache.org/1.24.1/examples.html
One of the best reasons to use tika via tika-server is that you isolate
potential catastrophic problems in another
Hi,
I am using Tika to extract text from Word Docs and PDFs locally. It's
great. Thank you Apache and Tika developers!
Could someone help me understand why Tika offers a client-server option
instead of just a code library? I am sure there was/is a good reason, so I
am curious if anyone knows or