I checked and these 'insanity' cached keys correspond to fields we use for
both grouping and faceting. The same behavior is documented here:
https://issues.apache.org/jira/browse/SOLR-4866, although I have single
shards for every replica which the jira says is a setup which should not
generate these issues.

What I don't get is why the cluster was running fine with solr 4.4,
although double checking I was using LUCENE_40 as the match version. If I
use this match version in my current running 4.10 cluster will it make a
difference, or will I experience more issues than if I just roll back to
4.4 with LUCENE_40 match version? The problem in the end is that the
fieldcache grows unlimitedly. I'm thinking its because of the insanity
entries but I'm not really sure. It seem like a really big problem to leave
unattended or is the use case for faceting and grouping on the same field
not that common?

On Tue, Sep 16, 2014 at 11:06 AM, Luis Carlos Guerrero
lcguerreroc...@gmail.com> wrote:

> Thanks for the response, I've been working on solving some of the most
> evident issues and I also added your garbage collector parameters. First of
> all the Lucene field cache is being filled with some entries which are
> marked as 'insanity'. Some of these were related to a custom field that we
> use for our ranking. We fixed our custom plugin classes so that we wouldn't
> see any entries related to those fields there, but it seems there are other
> related problems with the field cache. Mainly the cache is being filled
> with these types of insanity entries:
> 'SUBREADER: Found caches for descendants of StandardDirectoryReader'
> They are all related to standard solr fields. Could it be that our current
> schemas and configs have some incorrect setting that is not compliant with
> this lucene version? I'll keep investigating the subject but if there is
> any additional information you can give me about these types of field cache
> insanity warnings it would be really helpful.
On Thu, Sep 11, 2014 at 3:00 PM, Timothy Potter
> wrote:
>> Probably need to look at it running with a profiler to see what's up.
>> Here's a few additional flags that might help the GC work better for
>> you (which is not to say there isn't a leak somewhere):
>> -XX:MaxTenuringThreshold=8 -XX:CMSInitiatingOccupancyFraction=40
>> This should lead to a nice up-and-down GC profile over time.
On Thu, Sep 11, 2014 at 10:52 AM, Luis Carlos Guerrero
>> <lcguerreroc...@gmail.com> wrote:
>> > hey guys,
>> >
>> > I'm running a solrcloud cluster consisting of five nodes. My largest
>> index
>> > contains 2.5 million documents and occupies about 6 gigabytes of disk
>> > space. We recently switched to the latest solr version (4.10) from
>> version
>> > 4.4.1 which we ran successfully for about a year without any major
>> issues.
>> > From the get go we started having memory problems caused by the CMS old
>> > heap usage being filled up incrementally. It starts out with a very low
>> > memory consumption and after 12 hours or so it ends up using up all
>> > available heap space. We thought it could be one of the caches we had
>> > configured, so we reduced our main core filter cache max size from 1024
>> to
>> > 512 elements. The only thing we accomplished was that the cluster ran
>> for a
>> > longer time than before.
>> >
>> > I generated several heapdumps and basically what is filling up the heap
>> is
>> > lucene's field cache. it gets bigger and bigger until it fills up all
>> > available memory.
>> >
>> > My jvm memory settings are the following:
>> >
>> > -Xms15g -Xmx15g -XX:PermSize=512m -XX:MaxPermSize=512m -XX:NewSize=5g
>> > -XX:MaxNewSize=5g
>> > -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent -XX:+PrintGCDateStamps
>> > -XX:+PrintGCDetails -XX:+HeapDumpOnOutOfMemoryError
>> -XX:+UseConcMarkSweepGC
>> > What's weird to me is that we didn't have this problem before, I'm
>> thinking
>> > this is some kind of memory leak issue present in the new lucene. We ran
>> > our old cluster for several weeks at a time without having to redeploy
>> > because of config changes or other reasons. Was there some issue
>> reported
>> > related to elevated memory consumption by the field cache?
>> >
>> > any help would be greatly appreciated.
>> >
>> > regards,
>> >
