Thanks for this - I haven't any previous experience with utilising SSDs in the 
way you suggest, so I guess I need to start learning! And thanks for the 
Danish-webscale URL, looks like very informed reading. (Yes, I think we're 
working in similar industries with similar constraints and expectations).

Compiliing my answers into one email, " Curious how many documents per shard 
you were planning? The number of documents per shard and field type will drive 
the amount of a RAM needed to sort and facet."
- Number of documents per shard, I think about 200 million. That's a bit of a 
rough estimate based on other Solrs we run though. Which I think means we hold 
a lot of data for each document, though I keep arguing to keep this to the 
truly required minimum. We also have many facets, some of which are pretty 
large (I'm stretching my understanding here but I think most documents have 
many 'entries' in many facets so these really hit us performance-wise.)

I try to keep a 1-to-1 ratio of Solr nodes to CPUs with a few spare for the 
operating system. I utilise MMapDirectory to manage memory via the OS. So at 
this moment I guessing that we'll have 56 Solr dedicated CPUs across 2 physical 
32 CPU servers and _hopefully_ 256GB RAM on each. This would give 28 shards and 
each would have 5GB java memory (in Tomcat), leaving 126GB on each server for 
the OS and MMap. (I believe the Solr theory for this doesn't accurately work 
out but we can accept the edge cases where this will fail.)

I can also see that our hardware requirements will also depend on usage as well 
as the volume of data, and I've been pondering how best we can structure our 
index/es to facilitate a long term service (which means that, given it's a lot 
of data, I need to structure the data so that new usage doesn't require 
re-indexing.) But at this early stage, as people say, we need to prototype, 
test, profile etc. and to do that I need the hardware to run the trials (policy 
dictates that I buy the production hardware now, before profiling - I get to 
control much of the design and construction so I don't argue with this!) 

Thanks for all the comments everyone, all very much appreciated :)
Gil


-----Original Message-----
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: 11 December 2013 12:02
To: solr-user@lucene.apache.org
Subject: Re: Solr hardware memory question

On Tue, 2013-12-10 at 17:51 +0100, Hoggarth, Gil wrote:
> We're probably going to be building a Solr service to handle a dataset 
> of ~60TB, which for our data and schema typically gives a Solr index 
> size of 1/10th - i.e., 6TB. Given there's a general rule about the 
> amount of hardware memory required should exceed the size of the Solr 
> index (exceed to also allow for the operating system etc.), how have 
> people handled this situation?

By acknowledging that it is cheaper to buy SSDs instead of trying to compensate 
for slow spinning drives with excessive amounts of RAM. 

Our plans for an estimated 20TB of indexes out of 372TB of raw web data is to 
use SSDs controlled by a single machine with 512GB of RAM (or was it 256GB? 
I'll have to ask the hardware guys):
https://sbdevel.wordpress.com/2013/12/06/danish-webscale/

As always YMMW and the numbers you quite elsewhere indicates that your queries 
are quite complex. You might want to be a bit of profiling to see if they are 
heavy enough to make the CPU the bottleneck.

Regards,
Toke Eskildsen, State and University Library, Denmark


Reply via email to