On 12/10/2013 9:51 AM, Hoggarth, Gil wrote:
> We're probably going to be building a Solr service to handle a dataset
> of ~60TB, which for our data and schema typically gives a Solr index
> size of 1/10th - i.e., 6TB. Given there's a general rule about the
> amount of hardware memory required should exceed the size of the Solr
> index (exceed to also allow for the operating system etc.), how have
> people handled this situation? Do I really need, for example, 12 servers
> with 512GB RAM, or are there other techniques to handling this?

That really depends on what kind of query volume you'll have and what
kind of performance you want.  If your query volume is low and you can
deal with slow individual queries, then you won't need that much memory.
 If either of those requirements increases, you'd probably need more
memory, up to the 6TB total -- or 12TB if you need to double the total
index size for redundancy purposes.  If your index is constantly growing
like most are, you need to plan for that too.

Putting the entire index into RAM is required for *top* performance, but
not for base functionality.  It might be possible to put only a fraction
of your index into RAM.  Only testing can determine what you really need
to obtain the performance you're after.

Perhaps you've already done this, but you should try as much as possible
to reduce your index size.  Store as few fields as possible, only just
enough to build a search result list/grid and retrieve the full document
from the canonical data store.  Save termvectors and docvalues on as few
fields as possible.  If you can, reduce the number of terms produced by
your analysis chains.

Thanks,
Shawn

Reply via email to