Thanks Shawn. You're absolutely right about the performance balance,
though it's good to hear it from an experienced source (if you don't
mind me calling you that!) Fortunately we don't have a top performance
requirement, and we have a small audience so a low query volume. On
similar systems we're "managing" to just provide a Solr service with a
3TB index size on 160GB RAM, though we have scripts to handle the
occasionally necessary service restart when someone submits a more
exotic query. This, btw, gives a response time of ~45-90 seconds for
uncached queries. My question I suppose comes from my hope that we can
do away with the restart scripts as I doubt they help the Solr service
(as they can if necessary just kill processes and restart), and get to
responses times < 20 seconds.

-----Original Message-----
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 10 December 2013 17:37
To: solr-user@lucene.apache.org
Subject: Re: Solr hardware memory question

On 12/10/2013 9:51 AM, Hoggarth, Gil wrote:
> We're probably going to be building a Solr service to handle a dataset

> of ~60TB, which for our data and schema typically gives a Solr index 
> size of 1/10th - i.e., 6TB. Given there's a general rule about the 
> amount of hardware memory required should exceed the size of the Solr 
> index (exceed to also allow for the operating system etc.), how have 
> people handled this situation? Do I really need, for example, 12 
> servers with 512GB RAM, or are there other techniques to handling
this?

That really depends on what kind of query volume you'll have and what
kind of performance you want.  If your query volume is low and you can
deal with slow individual queries, then you won't need that much memory.
 If either of those requirements increases, you'd probably need more
memory, up to the 6TB total -- or 12TB if you need to double the total
index size for redundancy purposes.  If your index is constantly growing
like most are, you need to plan for that too.

Putting the entire index into RAM is required for *top* performance, but
not for base functionality.  It might be possible to put only a fraction
of your index into RAM.  Only testing can determine what you really need
to obtain the performance you're after.

Perhaps you've already done this, but you should try as much as possible
to reduce your index size.  Store as few fields as possible, only just
enough to build a search result list/grid and retrieve the full document
from the canonical data store.  Save termvectors and docvalues on as few
fields as possible.  If you can, reduce the number of terms produced by
your analysis chains.

Thanks,
Shawn

Reply via email to