Hi,

I am seeing some cases with Tika 2.2.1 where despite setting a watchdog to
limit the heap to 3GB, the entire Tika container exceeds 6GB and that
exceeds the resource memory limit, so it gets OOM-ed. Here is one example:

total-vm:8109464kB, anon-rss:99780kB, file-rss:28204kB, shmem-rss:32kB,
UID:0 pgtables:700kB oom_score_adj:-997

Only some files seem to be causing this behavior.

The memory ramps up fairly quickly, in a few tens of seconds it can go from
1GB to 6GB.

The next step is to check if this goes away with 2.8.0, but I wonder if any
of the following explanations make any sense:
1. The JVM is slow to observe the forked process exceeding its heap and
does not terminate it in time
2. It's not the heap that grows, but there is some stack overflow due to
very deep recursion.

Finally, are there any file types that are known to use a lot of memory
with Tika?

Thanks,
Cristi

Reply via email to