Tika OOM issue

Cristian Zamfir Fri, 20 Oct 2023 15:10:51 -0700

Hello!

I have been using the tika docker image pretty much out of the box so far
and I am puzzled by an OOM issue that has been going on for a while now:
despite quite conservative memory limits given to the JVM in terms of both
heap and total max memory, containers still crash with OOM.
These are the settings I am using inside containers capped at 6GB of memory
using tika server with the tika watchdog config:



      <forkedJvmArgs>
        <arg>-Xmx3g</arg>
            <arg>-Dlog4j.configurationFile=log4j2.xml</arg>
                <arg>-XX:+UseContainerSupport</arg>
        <arg>-XX:+UnlockExperimentalVMOptions</arg>
        <arg>-XX:MaxRAMPercentage=30</arg>
      </forkedJvmArgs>



With these settings, the JVM quite often deals well with terminating
processes that hit the memory cap and the watchdog restarts them:

[pool-2-thread-1] 21:38:26,395
org.apache.tika.server.core.TikaServerWatchDog forked process exited
with exit value 137


However, from time to time,  the JVM seems to not be able to deal with it,
the OS kicks in and the container is killed with OOM. My only explanation
so far is that the JVM is too slow to kill the forked process and the
memory usage blows up quite quickly. You can see below how the total-vm
values are close to 6GB at OOM time. This does not make sense IMO, the JVM
should kill these processes way before reaching the e.g., 5613608kB value,
actually the forked process should not exceed 1.8GB if we take into account
at MaxRAMPercentage.

Another puzzling fact is that the anon + file RSS do not really add up to
the total-vm size, so I am guessing that this is not actually due to heap.
Could this be caused by some native code?


dmesg -T | grep "Killed process"

[Fri Oct 20 21:14:13 2023] Memory cgroup out of memory: Killed process
109549 (java) total-vm:5632740kB, anon-rss:1036696kB,
file-rss:24668kB, shmem-rss:0kB, UID:35002 pgtables:2532kB
oom_score_adj:-997
[Fri Oct 20 21:14:27 2023] Memory cgroup out of memory: Killed process
109713 (java) total-vm:5613608kB, anon-rss:1029280kB,
file-rss:24380kB, shmem-rss:0kB, UID:35002 pgtables:2456kB
oom_score_adj:-997
[Fri Oct 20 21:14:34 2023] Memory cgroup out of memory: Killed process
109839 (java) total-vm:5607392kB, anon-rss:976664kB, file-rss:24116kB,
shmem-rss:0kB, UID:35002 pgtables:2336kB oom_score_adj:-997
[Fri Oct 20 21:14:52 2023] Memory cgroup out of memory: Killed process
109970 (java) total-vm:5598332kB, anon-rss:954312kB, file-rss:24592kB,
shmem-rss:0kB, UID:35002 pgtables:2272kB oom_score_adj:-997
[Fri Oct 20 21:15:19 2023] Memory cgroup out of memory: Killed process
110089 (java) total-vm:5615776kB, anon-rss:946484kB, file-rss:24672kB,
shmem-rss:0kB, UID:35002 pgtables:2280kB oom_score_adj:-997
[Fri Oct 20 21:15:29 2023] Memory cgroup out of memory: Killed process
110269 (java) total-vm:5602004kB, anon-rss:948548kB, file-rss:24412kB,
shmem-rss:0kB, UID:35002 pgtables:2280kB oom_score_adj:-997
[Fri Oct 20 21:15:42 2023] Memory cgroup out of memory: Killed process
110367 (java) total-vm:5607104kB, anon-rss:942636kB, file-rss:24524kB,
shmem-rss:0kB, UID:35002 pgtables:2284kB oom_score_adj:-997
[Fri Oct 20 21:16:07 2023] Memory cgroup out of memory: Killed process
110464 (java) total-vm:5593792kB, anon-rss:940524kB, file-rss:24712kB,
shmem-rss:0kB, UID:35002 pgtables:2216kB oom_score_adj:-997
[Fri Oct 20 21:16:17 2023] Memory cgroup out of memory: Killed process
110684 (java) total-vm:5627620kB, anon-rss:910000kB, file-rss:24340kB,
shmem-rss:0kB, UID:35002 pgtables:2224kB oom_score_adj:-997
[Fri Oct 20 21:16:25 2023] Memory cgroup out of memory: Killed process
110798 (java) total-vm:5616588kB, anon-rss:889436kB, file-rss:24500kB,
shmem-rss:0kB, UID:35002 pgtables:2216kB oom_score_adj:-997
[Fri Oct 20 21:16:31 2023] Memory cgroup out of memory: Killed process
110939 (java) total-vm:5619708kB, anon-rss:839724kB, file-rss:23796kB,
shmem-rss:0kB, UID:35002 pgtables:2100kB oom_score_adj:-997
[Fri Oct 20 21:16:43 2023] Memory cgroup out of memory: Killed process
111042 (java) total-vm:5601976kB, anon-rss:807116kB, file-rss:24420kB,
shmem-rss:0kB, UID:35002 pgtables:2000kB oom_score_adj:-997
[Fri Oct 20 21:17:03 2023] Memory cgroup out of memory: Killed process
111165 (java) total-vm:5599008kB, anon-rss:792704kB, file-rss:24724kB,
shmem-rss:0kB, UID:35002 pgtables:1944kB oom_score_adj:-997
[Fri Oct 20 21:17:09 2023] Memory cgroup out of memory: Killed process
111317 (java) total-vm:5612224kB, anon-rss:767304kB, file-rss:24400kB,
shmem-rss:0kB, UID:35002 pgtables:1984kB oom_score_adj:-997
[Fri Oct 20 21:17:16 2023] Memory cgroup out of memory: Killed process
111427 (java) total-vm:5613572kB, anon-rss:739720kB, file-rss:24196kB,
shmem-rss:0kB, UID:35002 pgtables:1892kB oom_score_adj:-997
[Fri Oct 20 21:17:28 2023] Memory cgroup out of memory: Killed process
111525 (java) total-vm:5603008kB, anon-rss:737940kB, file-rss:24796kB,
shmem-rss:0kB, UID:35002 pgtables:1860kB oom_score_adj:-997
[Fri Oct 20 21:17:36 2023] Memory cgroup out of memory: Killed process
111620 (java) total-vm:5602048kB, anon-rss:728384kB, file-rss:24480kB,
shmem-rss:0kB, UID:35002 pgtables:1828kB oom_score_adj:-997
[Fri Oct 20 21:17:43 2023] Memory cgroup out of memory: Killed process
111711 (java) total-vm:5601984kB, anon-rss:710832kB, file-rss:24648kB,
shmem-rss:0kB, UID:35002 pgtables:1804kB oom_score_adj:-997
[Fri Oct 20 21:17:55 2023] Memory cgroup out of memory: Killed process
111776 (java) total-vm:5594816kB, anon-rss:709584kB, file-rss:24444kB,
shmem-rss:0kB, UID:35002 pgtables:1824kB oom_score_adj:-997



I guess my question is if I am missing something that explains this and I
could configure tika-server to preempt this issue.


Going forward however, I realize that I need to set up the following 3, and
I have a question for each:

   1. concurrency control to avoid overwhelming tika-sever (seems like I
   could only control concurrency on the sender side since tika server does
   not provide a way to limit the number of concurrent request). Is that
   correct?
   2. request isolation to avoid that a single file brings down an entire
   instance -> is the only recommended solution to use tika pipes?
   3. implement timeouts and memory limits per request, to avoid that a
   single request can go haywire and use too much CPU and/or memory -> is
   there a way to configure this already and maybe I missed it?


Thanks! I realize these are a lot of questions 🙂

Cristi

Tika OOM issue

Reply via email to