dropped cc... >I noticed that Tika prints in the logs OOM (null), but seems to recover by >itself even when not using -spawnChild. Is this the expected behavior?
When not in -spawnChild mode, Tika is catching OOM exceptions (when it can), but it isn't "recovering"... the jvm may be in an inconsistent state, and it is safest to restart the jvm. It would probably be good practice when in -spawnChild mode to use -XX:+ExitOnOutOfMemoryError, or on the tika commandline -JXX:+ExitOnOutOfMemoryError. I highly encourage you to use -spawnChild mode, or the new pipes modules in 2.x if those will work for you at some point...those are still beta. OOMs are one thing, but infinite loops are another. 1. Do you have a recommendation for a stress test that would allow me to easily test OOM behavior? The MockParser is built for exactly this: https://cwiki.apache.org/confluence/display/TIKA/MockParser Let us know if you have any questions about it. The key elements for you are <fakeload/>, <throw/> <oom/> and probably <system_exit/>. That's for synthetic load testing. If you want files in the wild, we have 2TB of files from the wild: https://corpora.tika.apache.org/base/docs/ 2. For implementing a health check that detects when Tika is stuck, I could periodically send a simple request and check that the reply is correct, do you recommend a better approach? We have a rudimentary /status endpoint, which will give you number of restarts, number of files processed, milliseconds since last parse. You have to turn it on via the commandline: -status. On Wed, Jun 2, 2021 at 6:50 AM Cristian Zamfir <[email protected]> wrote: > > Hi! > > I noticed that Tika prints in the logs OOM (null), but seems to recover by > itself even when not using -spawnChild. Is this the expected behavior? I am > trying to figure out when logs containing "OOM" are critical and would > require a container restart. > > I also wanted to bring up two of my questions below, I am looking forward to > your feedback: > 1. Do you have a recommendation for a stress test that would allow me to > easily test OOM behavior? > 2. For implementing a health check that detects when Tika is stuck, I could > periodically send a simple request and check that the reply is correct, do > you recommend a better approach? > > Thanks, > Cristi > > On Sat, May 29, 2021 at 2:58 PM Cristian Zamfir <[email protected]> wrote: >> >> >> > On 28 May 2021, at 19:03, Tim Allison <[email protected]> wrote: >> > >> > Tika 2.x should help with this in pipes and async. Your system should >> > expect to go oom or crash at some point if you're processing enough >> > files. >> >> I believe that this is what is happening in my case, it’s not due to a >> single file, it happens under high load when processing many files at once. >> >> > >> > Right --spawnChild is not default in 1.x, but it will be in 2.x. And, >> > yes, you should be using it. To set the Xmx in the forked process add >> > -J, as in -JXmx2g would set the Xmx for the forked process. >> >> >> Did both now and I think this provides good recovery from OOM. >> >> >> > >> > I don't have experience to recommend bumping Xmx to close to your >> > container's max memory. In java programs that do a bunch of work off >> > heap, this would be a bad idea because you need to leave resources for >> > your system os, but I don't think we do much off heap. >> >> What’s your take on a configuration in which the container is capped at 4GB >> and the spawned child has a heap limit of 3GB? Sounds like a pretty safe >> margin to me. >> >> > >> > Which file types are causing OOMs? The MP4Parser is notorious, and >> > we're looking to swap it out in 2.x for a different parser. >> >> Good to hear. I don’t know how to identify the root cause because there are >> many files sent at once. >> However, it would be great to learn if there is a quick way to trigger a >> high load and test resiliency to OOM, do you have a recommendation? >> >> >> > >> > Yep, TIKA-3353 is the monitoring that Nick was mentioning. >> >> I am actually more interested in health checks, to detect when the system is >> stuck without automatically restarting. A built-in health check would >> certainly be a nice feature. >> >> Besides OOM, one other possible cause is if /tmp gets full - for instance I >> see here >> https://github.com/tongwang/tika-server-docker/blob/master/bin/healthcheck >> that /tmp is cleaned up periodically and the health check fails if it is too >> full. >> >> Are there any other situations that could indicate that the container is >> stuck and needs a restart and if yes, is there a way to detect the condition? >> >> Thanks, >> Cristi >> >> > >> > On Fri, May 28, 2021 at 9:08 AM Cristian Zamfir <[email protected]> >> > wrote: >> >> >> >> Thanks for your answer Nick! >> >> >> >> I am running apache/tika:latest-full which is using 1.25. Looks like I >> >> need at least version 1.26 for >> >> https://www.google.com/url?q=https://issues.apache.org/jira/browse/TIKA-3353&source=gmail-imap&ust=1622826254000000&usg=AOvVaw1we1l0Sh-gWif4FqbZ2qek, >> >> but I am not sure if this is not overkill for implementing basic >> >> liveness health checks. >> >> >> >> It's clear that –spawnChild and ForkParser are two must-haves that AFAIU >> >> are not default in apache/tika:latest-full >> >> >> >> My guess is that I also need to set the jvm heap size close to the memory >> >> resource limit for the container, but that's not ideal because the heap >> >> size would be statically configured while the memory resource limits are >> >> dynamic. Or maybe this is not necessary if I use -spawnChild? >> >> >> >> I am looking forward to your answers, thanks a lot! >> >> >> >> Cristi >> >> >> >> >> >> On Fri, May 28, 2021 at 2:55 PM Nick Burch <[email protected]> wrote: >> >>> >> >>> On Thu, 27 May 2021, Cristian Zamfir wrote: >> >>>> I am running some stress tests of the latest tika server docker (not >> >>>> modified in any way, just pulled from the registry) and seeing that >> >>>> after a >> >>>> few hours I see OOM in the logs. The container has a limit of 4GB set in >> >>>> K8S. I am wondering if you have any best practices on how to avoid this. >> >>> >> >>> Hopefully one of our Tika+Docker experts will be along in a minute to >> >>> help >> >>> advise! >> >>> >> >>> For now, the general advice is documented at: >> >>> https://www.google.com/url?q=https://cwiki.apache.org/confluence/display/TIKA/The%2BRobustness%2Bof%2BApache%2BTika&source=gmail-imap&ust=1622826254000000&usg=AOvVaw0p_ynGwlHapvMiy24sF1FP >> >>> >> >>> Also, which version of Tika are you on? There have been some >> >>> contributions >> >>> recently around monitoring the server, which you might want to upgrade >> >>> for, eg TIKA-3353 >> >>> >> >>> Nick >>
