: There are 14,696,502 documents, we are doing a lot of funky stuff but I'm : not sure which is most likely to cause an impact. We're sorting on a dynamic : field there are about 1000 different variants of this field that look like : "priority_sort_for_<client_id>", which is an integer field. I've heard that : sorting can have a big impact on memory consumption, could that be it?
sorting on a field requires that an array of the corrisponding type be constructed for that field - the size of the array is the size of maxDoc (ie: the number of documents in your index, including deleted documents). If you are using TrieInts, and have an index with no deletions, sorting ~14.7Mil docs on 1000 diff int fields will take up about ~55GB. Thats a minimum just for the sorting of those int fields (SortablIntField which keeps a string version of the field value will be signifcantly bigger) and doesn't take into consideration any other data structures used for searching. I'm not a GC expert, but based on my limited understanding your graph actually seems fine to me .. particularly the part where it says you've configured a Max heap of ~122GB or ram, and it's never spend anytime doing ConcurrentMarkSweep. My uneducated understanding of those two numbers is that you've told the JVM it can use an ungodly amount of RAM, so it is. It's done some basic cleanup of young gen (ParNew) but because the heap size has never gone above 50GB, it hasn't found any reason to actualy start a CMS GC to look for dea objects in Old Gen that it can clean up. (Can someone who understands GC and JVM tunning better then me please sanity check me on that?) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!