I managed to get this done. The facet queries now facets on a multivalue field as opposed to the dynamic field names.
Unfortunately it doesn't seem to have done much difference, if any at all. Some more information that might help: The JVM memory seem to be eaten up slowly. I dont think that there is one single query that causes the problem. My test case (dumping 180 clients on top of solr) takes hours before it causes an OOM. Often a full day. The memory usage wobbles up and down, so the GC is at least partially doing its job. It still works its way up to 100% eventually. When that happens it either OOM's or it stops the world and brings the memory consumption to 10-15 gigs. I did try to facet on all products across all clients (about 1.4 mil docs) and i could not make it OOM on a server with a 4 gig jvm. This was on a dedicated test server with my test being the only traffic. I am beginning to think that this may be related to traffic volume and not just on the type of query that I do. I tried to calculate the memory requirement example you gave me above based on the change that got rid of the dynamic fields. documents = ~1.400.000 references 11.200.000 (we facet on two multivalue fields with each 4 values on average, so 1.400.000 * 2 * 4 = 11.200.000 unique values = 1.132.344 (total number of variant options across all clients. This is what we facet on) 1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field (we have 4 fields)? I must be calculating this wrong. On Mon, Apr 15, 2013 at 2:10 PM, John Nielsen <j...@mcb.dk> wrote: > I did a search. I have no occurrence of "UnInverted" in the solr logs. > > > Another explanation for the large amount of memory presents itself if > > you use a single index: If each of your clients facet on at least one > > fields specific to the client ("client123_persons" or something like > > that), then your memory usage goes through the roof. > > This is exactly how we facet right now! I will definetely rewrite the > relevant parts of our product to test this out before moving further down > the docValues path. > > I will let you know as soon as I know one way or the other. > > > On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen > <t...@statsbiblioteket.dk>wrote: > >> On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote: >> >> > The FieldCache is the big culprit. We do a huge amount of faceting so >> > it seems right. >> >> Yes, you wrote that earlier. The mystery is that the math does not check >> out with the description you have given us. >> >> > Unfortunately I am super swamped at work so I have precious little >> > time to work on this, which is what explains my silence. >> >> No problem, we've all been there. >> > >> [Band aid: More memory] >> >> > The extra memory helped a lot, but it still OOM with about 180 clients >> > using it. >> >> You stated earlier that you has a "solr cluster" and your total(?) index >> size was 35GB, with each "register" being between "15k" and "30k". I am >> using the quotes to signify that it is unclear what you mean. Is your >> cluster multiple machines (I'm guessing no), multiple Solr's, cores, >> shards or maybe just a single instance prepared for later distribution? >> Is a register a core, shard or a simply logical part (one client's data) >> of the index? >> >> If each client has their own core or shard, that would mean that each >> client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180 >> ~= 200MB of index. That sounds quite high and you would need a very >> heavy facet to reach that. >> >> If you could grep "UnInverted" from the Solr log file and paste the >> entries here, that would help to clarify things. >> >> >> Another explanation for the large amount of memory presents itself if >> you use a single index: If each of your clients facet on at least one >> fields specific to the client ("client123_persons" or something like >> that), then your memory usage goes through the roof. >> >> Assuming an index with 10M documents, each with 5 references to a modest >> 10K unique values in a facet field, the simplified formula >> #documents*log2(#references) + #references*log2(#unique_values) bit >> tells us that this takes at least 110MB with field cache based faceting. >> >> 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at >> least double that. This fits neatly with your new heap of 64GB. >> >> >> If my guessing is correct, you can solve your memory problems very >> easily by sharing _all_ the facet fields between your clients. >> This should bring your memory usage down to a few GB. >> >> You are probably already restricting their searches to their own data by >> filtering, so this should not influence the returned facet values and >> counts, as compared to separate fields. >> >> This is very similar to the thread "Facets with 5000 facet fields" BTW. >> >> > Today I finally managed to set up a test core so I can begin to play >> > around with docValues. >> >> If you are using a single index with the individual-facet-fields for >> each client approach, the DocValues will also have scaling issues, as >> the amount of values (of which the majority will be null) will be >> #clients*#documents*#facet_fields >> This means that the adding a new client will be progressively more >> expensive. >> >> On the other hand, if you use a lot of small shards, DocValues should >> work for you. >> >> Regards, >> Toke Eskildsen >> >> >> > > > -- > Med venlig hilsen / Best regards > > *John Nielsen* > Programmer > > > > *MCB A/S* > Enghaven 15 > DK-7500 Holstebro > > Kundeservice: +45 9610 2824 > p...@mcb.dk > www.mcb.dk > -- Med venlig hilsen / Best regards *John Nielsen* Programmer *MCB A/S* Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 p...@mcb.dk www.mcb.dk