Hi,

I would like to get some advice to setup a Solr Cloud on a set of powerful 
machines. The average size of the documents handled by the Solr Cloud is about 
0.5 KB, and the number of documents stored in Solr Cloud could reach billions. 
When indexing, the incoming document rate could be as high as 20k/second; and 
the major query operations performed on the Cloud are searching, faceting, and 
some other aggregations. There will NOT be many concurrent queries (replication 
factor of 2 may be good enough), but some queries could cover big range of 
documents.

As an example, I have 8 powerful machines (nodes), and each machine (node) has:

16 CPU cores
256GB RAM
48TB physical disk space

The Solr Cloud may be setup in following different ways (assuming replication 
factor is 2):

1) 8 shards on 8 Solr servers, total 16 cores (including replicas)
Each machine (node) holds one Solr server (JVM), and each Solr server has one 
shard. 

2) 32 shards on 8 Solr servers, total 64 cores (including replicas)
Each machine (node) holds one Solr server (JVM), and each Solr server has 4 
shards. 

3) 32 shards on 16 Solr servers, total 64 cores (including replicas)
Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 2 
shards.

4) 64 shards on 16 Solr servers, total 128 cores (including replicas)
Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 4 
shards.

5) 128 shards on 32 Solr servers, total 256 cores (including replicas)
Each machine (node) holds 4 Solr servers (JVMs), and each Solr server has 4 
shards.

Could someone advice which layout is better? Or you have some other better 
layout? The basic idea is to "divide" a powerful machine to have more Solr 
Servers and/or more shards. I would like to get some advice about the 
trade-offs and general guidelines about the division. It would be very helpful 
if you can advice an example setup for this use case.

Thanks a lot.

Shushuai

Reply via email to