Wayne, Have you asked on Tika's ML? You may also want to watch https://issues.apache.org/jira/browse/SOLR-2901
Otis ---- Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html ----- Original Message ----- > From: Wayne W <waynemailingli...@gmail.com> > To: solr-user@lucene.apache.org > Cc: > Sent: Saturday, January 14, 2012 2:53 AM > Subject: Solr - Tika(?) memory leak > > Hi, > > we're using Solr running on tomcat with 1GB in production, and of late > we've been having a huge number of OutOfMemory issues. It seems from > what I can tell this is coming from the tika extraction of the > content. I've processed the java dump file using a memory analyzer and > its pretty clean at least the class involved. It seems like a leak to > me, as we don't parse any files larger than 20M, and these objects are > taking up ~700M > > I've attached 2 screen shots from the tool (not sure if you receive > attachments). > > But to summarize (class, number of objects, Used heap size, Retained Heap > Size): > > > org.apache.xmlbeans.impl.store.Xob$ElementXObj 838,993 > 80,533,728 604,606,040 > org.apache.poi.openxml4j.opc.ZipPackage 2 > 112 87,009,848 > char[] > 587 32,216,960 38,216,950 > > > We're really desperate to find a solution to this - any ideas or help > is greatly appreciated. > Wayne >