Tika 2.x should help with this in pipes and async.  Your system should
expect to go oom or crash at some point if you're processing enough
files.

Right --spawnChild is not default in 1.x, but it will be in 2.x.  And,
yes, you should be using it. To set the Xmx in the forked process add
-J, as in -JXmx2g would set the Xmx for the forked process.

I don't have experience to recommend bumping Xmx to close to your
container's max memory. In java programs that do a bunch of work off
heap, this would be a bad idea because you need to leave resources for
your system os, but I don't think we do much off heap.

Which file types are causing OOMs?  The MP4Parser is notorious, and
we're looking to swap it out in 2.x for a different parser.

Yep, TIKA-3353 is the monitoring that Nick was mentioning.

On Fri, May 28, 2021 at 9:08 AM Cristian Zamfir <[email protected]> wrote:
>
> Thanks for your answer Nick!
>
> I am running apache/tika:latest-full which is using 1.25. Looks like I need 
> at least version 1.26 for https://issues.apache.org/jira/browse/TIKA-3353, 
> but I am not sure if this is not overkill for implementing basic liveness 
> health checks.
>
> It's clear that –spawnChild and ForkParser are two must-haves that AFAIU are 
> not default in apache/tika:latest-full
>
> My guess is that I also need to set the jvm heap size close to the memory 
> resource limit for the container, but that's not ideal because the heap size 
> would be statically configured while the memory resource limits are dynamic. 
> Or maybe this is not necessary if I use -spawnChild?
>
> I am looking forward to your answers, thanks a lot!
>
> Cristi
>
>
> On Fri, May 28, 2021 at 2:55 PM Nick Burch <[email protected]> wrote:
>>
>> On Thu, 27 May 2021, Cristian Zamfir wrote:
>> > I am running some stress tests of the latest tika server docker (not
>> > modified in any way, just pulled from the registry) and seeing that after a
>> > few hours I see OOM in the logs. The container has a limit of 4GB set in
>> > K8S. I am wondering if you have any best practices on how to avoid this.
>>
>> Hopefully one of our Tika+Docker experts will be along in a minute to help
>> advise!
>>
>> For now, the general advice is documented at:
>> https://cwiki.apache.org/confluence/display/TIKA/The+Robustness+of+Apache+Tika
>>
>> Also, which version of Tika are you on? There have been some contributions
>> recently around monitoring the server, which you might want to upgrade
>> for, eg TIKA-3353
>>
>> Nick

Reply via email to