Tika memory leak?

Wayne W Mon, 16 Jan 2012 21:33:41 -0800

Hi,

we're using Solr running on tomcat with 1GB in production, and of late
we've been having a huge number of OutOfMemory issues. It seems from
what I can tell this is coming from the tika extraction ( tika-0.2.jar) of the
content. I've processed the java dump file using a memory analyzer and
its pretty clean at least the class involved. It seems like a leak to
me, as we don't parse any files larger than 20M, and these objects are
taking up ~700M


You can see screen shots here:
http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.36.27.png
http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.39.04.png


But to summarize (class, number of objects, Used heap size, Retained Heap Size):


org.apache.xmlbeans.impl.store.Xob$ElementXObj  838,993  80,533,728  604,606,040

org.apache.poi.openxml4j.opc.ZipPackage   2 112   87,009,848
char[]   587    32,216,960    38,216,950


We're really desperate to find a solution to this - any ideas or help
is greatly appreciated.

I didn't realize we'd got so far behind on the version we have, I need
to see however if the latest version will work with Solr ( I have a
feeling won't).

Wayne

Tika memory leak?

Reply via email to