Hi Wayne, Older versions of Tika have memory issues with parsing certain types of Excel sheets. It would be best to upgrade your version of Tika to the latest stable version.
Best, Daan On 17 January 2012 06:32, Wayne W <[email protected]> wrote: > Hi, > > we're using Solr running on tomcat with 1GB in production, and of late > we've been having a huge number of OutOfMemory issues. It seems from > what I can tell this is coming from the tika extraction ( tika-0.2.jar) of > the > content. I've processed the java dump file using a memory analyzer and > its pretty clean at least the class involved. It seems like a leak to > me, as we don't parse any files larger than 20M, and these objects are > taking up ~700M > > You can see screen shots here: > > http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.36.27.png > > http://dl.dropbox.com/u/6550402/Screen%20shot%202012-01-14%20at%2018.39.04.png > > > But to summarize (class, number of objects, Used heap size, Retained Heap > Size): > > > org.apache.xmlbeans.impl.store.Xob$ElementXObj 838,993 80,533,728 > 604,606,040 > > org.apache.poi.openxml4j.opc.ZipPackage 2 112 87,009,848 > char[] 587 32,216,960 38,216,950 > > > We're really desperate to find a solution to this - any ideas or help > is greatly appreciated. > > I didn't realize we'd got so far behind on the version we have, I need > to see however if the latest version will work with Solr ( I have a > feeling won't). > > Wayne >
