You can also sort on a field by using a function query instead of the "sort=field+desc" parameter. This will not eat up memory. Instead, it will be slower. In short, it is a classic speed v.s. space trade-off.
You'll have to benchmark and decide which you want, and maybe some fields need the fast sort and some can get away with the slow one. http://www.lucidimagination.com/search/?q=function+query On Thu, Sep 30, 2010 at 11:47 AM, Jeff Moss <jm...@heavyobjects.com> wrote: > I think you've probably nailed it Chris, thanks for that, I think I can get > by with a different approach than this. > > Do you know if I will get the same memory consumption using the > RandomFieldType vs the TrieInt? > > -Jeff > > On Thu, Sep 30, 2010 at 12:36 PM, Chris Hostetter > <hossman_luc...@fucit.org>wrote: > >> >> : There are 14,696,502 documents, we are doing a lot of funky stuff but I'm >> : not sure which is most likely to cause an impact. We're sorting on a >> dynamic >> : field there are about 1000 different variants of this field that look >> like >> : "priority_sort_for_<client_id>", which is an integer field. I've heard >> that >> : sorting can have a big impact on memory consumption, could that be it? >> >> sorting on a field requires that an array of the corrisponding type be >> constructed for that field - the size of the array is the size of maxDoc >> (ie: the number of documents in your index, including deleted documents). >> >> If you are using TrieInts, and have an index with no deletions, sorting >> ~14.7Mil docs on 1000 diff int fields will take up about ~55GB. >> >> Thats a minimum just for the sorting of those int fields (SortablIntField >> which keeps a string version of the field value will be signifcantly >> bigger) and doesn't take into consideration any other data structures used >> for searching. >> >> I'm not a GC expert, but based on my limited understanding your graph >> actually seems fine to me .. particularly the part where it says >> you've configured a Max heap of ~122GB or ram, and it's >> never spend anytime doing ConcurrentMarkSweep. My uneducated >> understanding of those two numbers is that you've told the JVM it can use >> an ungodly amount of RAM, so it is. It's done some basic cleanup of >> young gen (ParNew) but because the heap size has never gone above 50GB, >> it hasn't found any reason to actualy start a CMS GC to look for dea >> objects in Old Gen that it can clean up. >> >> >> (Can someone who understands GC and JVM tunning better then me please >> sanity check me on that?) >> >> >> -Hoss >> >> -- >> http://lucenerevolution.org/ ... October 7-8, Boston >> http://bit.ly/stump-hoss ... Stump The Chump! >> >> > -- Lance Norskog goks...@gmail.com