thanks Daan
On Tue, Jan 17, 2012 at 9:08 PM, Daan de Wit <[email protected]> wrote: > Hi Wayne, > > Older versions of Tika have memory issues with parsing certain types of > Excel sheets. It would be best to upgrade your version of Tika to the latest > stable version. > > Best, > Daan > > On 17 January 2012 06:32, Wayne W <[email protected]> wrote: >> >> Hi, >> >> we're using Solr running on tomcat with 1GB in production, and of late >> we've been having a huge number of OutOfMemory issues. It seems from >> what I can tell this is coming from the tika extraction ( tika-0.2.jar) of >> the >> content. I've processed the java dump file using a memory analyzer and >> its pretty clean at least the class involved. It seems like a leak to >> me, as we don't parse any files larger than 20M, and these objects are >> taking up ~700M >> >> You can see screen shots here: >> >> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.36.27.png >> >> http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.39.04.png >> >> >> But to summarize (class, number of objects, Used heap size, Retained Heap >> Size): >> >> >> org.apache.xmlbeans.impl.store.Xob$ElementXObj 838,993 80,533,728 >> 604,606,040 >> >> org.apache.poi.openxml4j.opc.ZipPackage 2 112 87,009,848 >> char[] 587 32,216,960 38,216,950 >> >> >> We're really desperate to find a solution to this - any ideas or help >> is greatly appreciated. >> >> I didn't realize we'd got so far behind on the version we have, I need >> to see however if the latest version will work with Solr ( I have a >> feeling won't). >> >> Wayne > >
