I checked and these 'insanity' cached keys correspond to fields we use for both grouping and faceting. The same behavior is documented here: https://issues.apache.org/jira/browse/SOLR-4866, although I have single shards for every replica which the jira says is a setup which should not generate these issues.
What I don't get is why the cluster was running fine with solr 4.4, although double checking I was using LUCENE_40 as the match version. If I use this match version in my current running 4.10 cluster will it make a difference, or will I experience more issues than if I just roll back to 4.4 with LUCENE_40 match version? The problem in the end is that the fieldcache grows unlimitedly. I'm thinking its because of the insanity entries but I'm not really sure. It seem like a really big problem to leave unattended or is the use case for faceting and grouping on the same field not that common? On Tue, Sep 16, 2014 at 11:06 AM, Luis Carlos Guerrero < lcguerreroc...@gmail.com> wrote: > Thanks for the response, I've been working on solving some of the most > evident issues and I also added your garbage collector parameters. First of > all the Lucene field cache is being filled with some entries which are > marked as 'insanity'. Some of these were related to a custom field that we > use for our ranking. We fixed our custom plugin classes so that we wouldn't > see any entries related to those fields there, but it seems there are other > related problems with the field cache. Mainly the cache is being filled > with these types of insanity entries: > > 'SUBREADER: Found caches for descendants of StandardDirectoryReader' > > They are all related to standard solr fields. Could it be that our current > schemas and configs have some incorrect setting that is not compliant with > this lucene version? I'll keep investigating the subject but if there is > any additional information you can give me about these types of field cache > insanity warnings it would be really helpful. > > On Thu, Sep 11, 2014 at 3:00 PM, Timothy Potter <thelabd...@gmail.com> > wrote: > >> Probably need to look at it running with a profiler to see what's up. >> Here's a few additional flags that might help the GC work better for >> you (which is not to say there isn't a leak somewhere): >> >> -XX:MaxTenuringThreshold=8 -XX:CMSInitiatingOccupancyFraction=40 >> >> This should lead to a nice up-and-down GC profile over time. >> >> On Thu, Sep 11, 2014 at 10:52 AM, Luis Carlos Guerrero >> <lcguerreroc...@gmail.com> wrote: >> > hey guys, >> > >> > I'm running a solrcloud cluster consisting of five nodes. My largest >> index >> > contains 2.5 million documents and occupies about 6 gigabytes of disk >> > space. We recently switched to the latest solr version (4.10) from >> version >> > 4.4.1 which we ran successfully for about a year without any major >> issues. >> > From the get go we started having memory problems caused by the CMS old >> > heap usage being filled up incrementally. It starts out with a very low >> > memory consumption and after 12 hours or so it ends up using up all >> > available heap space. We thought it could be one of the caches we had >> > configured, so we reduced our main core filter cache max size from 1024 >> to >> > 512 elements. The only thing we accomplished was that the cluster ran >> for a >> > longer time than before. >> > >> > I generated several heapdumps and basically what is filling up the heap >> is >> > lucene's field cache. it gets bigger and bigger until it fills up all >> > available memory. >> > >> > My jvm memory settings are the following: >> > >> > -Xms15g -Xmx15g -XX:PermSize=512m -XX:MaxPermSize=512m -XX:NewSize=5g >> > -XX:MaxNewSize=5g >> > -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent -XX:+PrintGCDateStamps >> > -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError >> -XX:+UseConcMarkSweepGC >> > What's weird to me is that we didn't have this problem before, I'm >> thinking >> > this is some kind of memory leak issue present in the new lucene. We ran >> > our old cluster for several weeks at a time without having to redeploy >> > because of config changes or other reasons. Was there some issue >> reported >> > related to elevated memory consumption by the field cache? >> > >> > any help would be greatly appreciated. >> > >> > regards, >> > >> > -- >> > Luis Carlos Guerrero >> > about.me/luis.guerrero >> > > > > -- > Luis Carlos Guerrero > about.me/luis.guerrero > -- Luis Carlos Guerrero about.me/luis.guerrero