Hi Wayne,

Older versions of Tika have memory issues with parsing certain types of
Excel sheets. It would be best to upgrade your version of Tika to the
latest stable version.

Best,
Daan

On 17 January 2012 06:32, Wayne W <[email protected]> wrote:

> Hi,
>
> we're using Solr running on tomcat with 1GB in production, and of late
> we've been having a huge number of OutOfMemory issues. It seems from
> what I can tell this is coming from the tika extraction ( tika-0.2.jar) of
> the
> content. I've processed the java dump file using a memory analyzer and
> its pretty clean at least the class involved. It seems like a leak to
> me, as we don't parse any files larger than 20M, and these objects are
> taking up ~700M
>
> You can see screen shots here:
>
> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.36.27.png
>
> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.39.04.png
>
>
> But to summarize (class, number of objects, Used heap size, Retained Heap
> Size):
>
>
> org.apache.xmlbeans.impl.store.Xob$ElementXObj  838,993  80,533,728
>  604,606,040
>
> org.apache.poi.openxml4j.opc.ZipPackage   2 112   87,009,848
> char[]   587    32,216,960    38,216,950
>
>
> We're really desperate to find a solution to this - any ideas or help
> is greatly appreciated.
>
> I didn't realize we'd got so far behind on the version we have, I need
> to see however if the latest version will work with Solr ( I have a
> feeling won't).
>
> Wayne
>

Reply via email to