Re: Why does Tika offer a client-server option?

2020-11-26 Thread Adam Rauch
Another option is to load and invoke Tika in its own classloader to keep its jars isolated from the rest of the application. We did this for a while, until we switched to Gradle and implemented the "careful exclusion" approach that Ken mentioned. Downside was the need to use reflection to

Re: Why does Tika offer a client-server option?

2020-11-25 Thread Ken Krugler
When we used Tika as a library with Hadoop map-reduce workflows, we had to run it in a separate thread with a timeout, and leave the thread as a zombie if/when it hung. As far as jar hell (a very real problem), you can either do careful exclusions in your dependency specification (painful, and

Re: Why does Tika offer a client-server option?

2020-11-25 Thread Tucker B
Not totally on topic but I think related to this thread. I'm currently exploring using tika as a library in Apache Spark. This approach suffers the same problems as using Tika as library mentioned above. Has anyone used Tika as a library in a Spark Job? Or would it still make sense to us something

Re: Why does Tika offer a client-server option?

2020-11-24 Thread Slava G
We have been using tika as java library, for a few years now and parsing millions of different files each day. And we're switching now to tika server as bugs in different tika components (dependencies) caused issue like exit of the jvm, memory issues and so. Also, tika and it's different

Re: Why does Tika offer a client-server option?

2020-11-23 Thread Ralph Soika
Hi Robert, in the sense of a microservice architecture it makes absolute sense to use Tika as a server/microservice component. As Tim Allison explained this helps you to separate your business requirements in isolated components (running in there own JVM). If you don't need to link the Tika

Re: Why does Tika offer a client-server option?

2020-11-23 Thread Tim Allison
Hi Robert, Thank you for the note. You can call Tika programmatically if you'd like with Java. Some examples are available here: https://tika.apache.org/1.24.1/examples.html One of the best reasons to use tika via tika-server is that you isolate potential catastrophic problems in another

Why does Tika offer a client-server option?

2020-11-23 Thread Robert Raines
Hi, I am using Tika to extract text from Word Docs and PDFs locally. It's great. Thank you Apache and Tika developers! Could someone help me understand why Tika offers a client-server option instead of just a code library? I am sure there was/is a good reason, so I am curious if anyone knows or