bq: The type of queries that are run can return anything from 1 million to 9.5 million documents, and typically run for anything from 20 to 45 minutes.
Uhhh, are you literally setting the &rows parameter to over 9.5M and getting that many docs all at once? Or is that just numFound and you're _really_ returning just a relatively few docs? Because if you're returning 9.5M rows, that's really an anti-pattern for Solr. There are other ways to do some of this (cursor mark, streaming aggregation, export). But before we go there I want to be sure I'm understanding the use-case. Because I agree with Toke, the performance numbers you give are waaaay out of what I would expect, so clearly I don't get something about your setup. Best, Erick On Tue, Jun 30, 2015 at 3:43 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > On Tue, 2015-06-30 at 16:39 +1000, Caroline Hind wrote: >> We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same >> time increased the physical RAM from 24Gb to 96Gb. We run multiple >> cores on this one server, approximately 20 in total, but primarily we >> have one that is huge in comparison to all of the others. This very >> large core consists of nearly 62 million documents, and the index is >> around 45Gb in size.(Is that index unreasonably large, should it be >> sharded?) > > The size itself sounds fine, but your performance numbers below are > worrying. As always it is hard to give advice on setups: > https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > >> I'm really unfamiliar with how we should be configuring our JVM. >> Currently we have it set to a maximum of 48Gb, up until yesterday it >> was set to 24Gb and we've been seeing the dreaded OOME messages from >> time to time. > > There is a shift in pointer size when one passes the 32GB mark for JVM > memory. Your 48GB allocation gives you about the same amount of heap as > a 32GB allocation would: > https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/ > Consider running two Solrs on the same machine instead. Maybe one for > the large collection and one for the rest? > > Anyway, OOMs with ~32GB of heap for 62M documents indicates that you are > doing heavy sorting, grouping or faceting on fields that does not have > DocValues enabled. Could you describe what you do in that regard? > >> The type of queries that are run can return anything from >> 1 million to 9.5 million documents, and typically run for anything from >> 20 to 45 minutes. > > Such response times are a thousand times higher than what most people > are seeing. There might be a perfectly fine reason for those response > times, but I suggest we sanity check them: Could you show us a typical > query and tell us how many concurrent queries you normally serve? > > - Toke Eskildsen, State and University Library, Denmark > >