Taking Chris's information into mind I was able to isolate this to a test case. I found this ticket that seems to indicate a fundamental problem in the solr/lucene boundary.
https://issues.apache.org/jira/browse/SOLR-1111 Here's how to reproduce my results: 1. Create an index with a field like this: <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <dynamicField name="foo_*" type="sint" indexed="true" stored="false" omitNorms="true" /> 2. Populate the index with test data the more the better 3. Run a set of queries that loop through foo_*, 1-1000 or so ought to fill up any heap. Here is what my heap looks like after running this test, attached. Can anybody familiar with the issue above (SOLR-1111) tell me if that's what is going on here, or I may need to submit a new bug. Thanks, -Jeff On Thu, Sep 30, 2010 at 12:36 PM, Chris Hostetter <hossman_luc...@fucit.org>wrote: > > : There are 14,696,502 documents, we are doing a lot of funky stuff but I'm > : not sure which is most likely to cause an impact. We're sorting on a > dynamic > : field there are about 1000 different variants of this field that look > like > : "priority_sort_for_<client_id>", which is an integer field. I've heard > that > : sorting can have a big impact on memory consumption, could that be it? > > sorting on a field requires that an array of the corrisponding type be > constructed for that field - the size of the array is the size of maxDoc > (ie: the number of documents in your index, including deleted documents). > > If you are using TrieInts, and have an index with no deletions, sorting > ~14.7Mil docs on 1000 diff int fields will take up about ~55GB. > > Thats a minimum just for the sorting of those int fields (SortablIntField > which keeps a string version of the field value will be signifcantly > bigger) and doesn't take into consideration any other data structures used > for searching. > > I'm not a GC expert, but based on my limited understanding your graph > actually seems fine to me .. particularly the part where it says > you've configured a Max heap of ~122GB or ram, and it's > never spend anytime doing ConcurrentMarkSweep. My uneducated > understanding of those two numbers is that you've told the JVM it can use > an ungodly amount of RAM, so it is. It's done some basic cleanup of > young gen (ParNew) but because the heap size has never gone above 50GB, > it hasn't found any reason to actualy start a CMS GC to look for dea > objects in Old Gen that it can clean up. > > > (Can someone who understands GC and JVM tunning better then me please > sanity check me on that?) > > > -Hoss > > -- > http://lucenerevolution.org/ ... October 7-8, Boston > http://bit.ly/stump-hoss ... Stump The Chump! > >