thanks Daan

On Tue, Jan 17, 2012 at 9:08 PM, Daan de Wit <[email protected]> wrote:
> Hi Wayne,
>
> Older versions of Tika have memory issues with parsing certain types of
> Excel sheets. It would be best to upgrade your version of Tika to the latest
> stable version.
>
> Best,
> Daan
>
> On 17 January 2012 06:32, Wayne W <[email protected]> wrote:
>>
>> Hi,
>>
>> we're using Solr running on tomcat with 1GB in production, and of late
>> we've been having a huge number of OutOfMemory issues. It seems from
>> what I can tell this is coming from the tika extraction ( tika-0.2.jar) of
>> the
>> content. I've processed the java dump file using a memory analyzer and
>> its pretty clean at least the class involved. It seems like a leak to
>> me, as we don't parse any files larger than 20M, and these objects are
>> taking up ~700M
>>
>> You can see screen shots here:
>>
>> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.36.27.png
>>
>> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.39.04.png
>>
>>
>> But to summarize (class, number of objects, Used heap size, Retained Heap
>> Size):
>>
>>
>> org.apache.xmlbeans.impl.store.Xob$ElementXObj  838,993  80,533,728
>>  604,606,040
>>
>> org.apache.poi.openxml4j.opc.ZipPackage   2 112   87,009,848
>> char[]   587    32,216,960    38,216,950
>>
>>
>> We're really desperate to find a solution to this - any ideas or help
>> is greatly appreciated.
>>
>> I didn't realize we'd got so far behind on the version we have, I need
>> to see however if the latest version will work with Solr ( I have a
>> feeling won't).
>>
>> Wayne
>
>

Reply via email to