On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote: > The FieldCache is the big culprit. We do a huge amount of faceting so > it seems right.
Yes, you wrote that earlier. The mystery is that the math does not check out with the description you have given us. > Unfortunately I am super swamped at work so I have precious little > time to work on this, which is what explains my silence. No problem, we've all been there. > [Band aid: More memory] > The extra memory helped a lot, but it still OOM with about 180 clients > using it. You stated earlier that you has a "solr cluster" and your total(?) index size was 35GB, with each "register" being between "15k" and "30k". I am using the quotes to signify that it is unclear what you mean. Is your cluster multiple machines (I'm guessing no), multiple Solr's, cores, shards or maybe just a single instance prepared for later distribution? Is a register a core, shard or a simply logical part (one client's data) of the index? If each client has their own core or shard, that would mean that each client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180 ~= 200MB of index. That sounds quite high and you would need a very heavy facet to reach that. If you could grep "UnInverted" from the Solr log file and paste the entries here, that would help to clarify things. Another explanation for the large amount of memory presents itself if you use a single index: If each of your clients facet on at least one fields specific to the client ("client123_persons" or something like that), then your memory usage goes through the roof. Assuming an index with 10M documents, each with 5 references to a modest 10K unique values in a facet field, the simplified formula #documents*log2(#references) + #references*log2(#unique_values) bit tells us that this takes at least 110MB with field cache based faceting. 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at least double that. This fits neatly with your new heap of 64GB. If my guessing is correct, you can solve your memory problems very easily by sharing _all_ the facet fields between your clients. This should bring your memory usage down to a few GB. You are probably already restricting their searches to their own data by filtering, so this should not influence the returned facet values and counts, as compared to separate fields. This is very similar to the thread "Facets with 5000 facet fields" BTW. > Today I finally managed to set up a test core so I can begin to play > around with docValues. If you are using a single index with the individual-facet-fields for each client approach, the DocValues will also have scaling issues, as the amount of values (of which the majority will be null) will be #clients*#documents*#facet_fields This means that the adding a new client will be progressively more expensive. On the other hand, if you use a lot of small shards, DocValues should work for you. Regards, Toke Eskildsen