Thanks for the response, I've been working on solving some of the most evident issues and I also added your garbage collector parameters. First of all the Lucene field cache is being filled with some entries which are marked as 'insanity'. Some of these were related to a custom field that we use for our ranking. We fixed our custom plugin classes so that we wouldn't see any entries related to those fields there, but it seems there are other related problems with the field cache. Mainly the cache is being filled with these types of insanity entries:
'SUBREADER: Found caches for descendants of StandardDirectoryReader' They are all related to standard solr fields. Could it be that our current schemas and configs have some incorrect setting that is not compliant with this lucene version? I'll keep investigating the subject but if there is any additional information you can give me about these types of field cache insanity warnings it would be really helpful. On Thu, Sep 11, 2014 at 3:00 PM, Timothy Potter <thelabd...@gmail.com> wrote: > Probably need to look at it running with a profiler to see what's up. > Here's a few additional flags that might help the GC work better for > you (which is not to say there isn't a leak somewhere): > > -XX:MaxTenuringThreshold=8 -XX:CMSInitiatingOccupancyFraction=40 > > This should lead to a nice up-and-down GC profile over time. > > On Thu, Sep 11, 2014 at 10:52 AM, Luis Carlos Guerrero > <lcguerreroc...@gmail.com> wrote: > > hey guys, > > > > I'm running a solrcloud cluster consisting of five nodes. My largest > index > > contains 2.5 million documents and occupies about 6 gigabytes of disk > > space. We recently switched to the latest solr version (4.10) from > version > > 4.4.1 which we ran successfully for about a year without any major > issues. > > From the get go we started having memory problems caused by the CMS old > > heap usage being filled up incrementally. It starts out with a very low > > memory consumption and after 12 hours or so it ends up using up all > > available heap space. We thought it could be one of the caches we had > > configured, so we reduced our main core filter cache max size from 1024 > to > > 512 elements. The only thing we accomplished was that the cluster ran > for a > > longer time than before. > > > > I generated several heapdumps and basically what is filling up the heap > is > > lucene's field cache. it gets bigger and bigger until it fills up all > > available memory. > > > > My jvm memory settings are the following: > > > > -Xms15g -Xmx15g -XX:PermSize=512m -XX:MaxPermSize=512m -XX:NewSize=5g > > -XX:MaxNewSize=5g > > -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent -XX:+PrintGCDateStamps > > -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError > -XX:+UseConcMarkSweepGC > > What's weird to me is that we didn't have this problem before, I'm > thinking > > this is some kind of memory leak issue present in the new lucene. We ran > > our old cluster for several weeks at a time without having to redeploy > > because of config changes or other reasons. Was there some issue reported > > related to elevated memory consumption by the field cache? > > > > any help would be greatly appreciated. > > > > regards, > > > > -- > > Luis Carlos Guerrero > > about.me/luis.guerrero > -- Luis Carlos Guerrero about.me/luis.guerrero