As someone else wrote there are a lot of uncertainties and I recommend to test 
yourself to find the optimal configuration. Some food for thought:
How many clients do you have and what is their concurrency? What operations 
will they do? Do they Access Solr directly? You can use Jmeter to simulate the 
querying part (and also the indexing). Depending on the concurrency of users 
you may need to think about the number of CPUs.
What does moderate indexing mean? How much does the collection grow per day ?
Have you thought about putting the Zookeeper ensemble on dedicated nodes?

Why do you want to use an older Solr version? Why not the newest + JDK 11?

In what format are the documents? Will you convert them before ? What analysis 
will you do on the documents (may have impact on index size etc)?

Also important - how do you plan to reindex the full collection in case a 
Schema field changes (hint: look that the user query aliases so this can be 
done without interruption).

Normally I would expect a web app in between also for security reasons. You may 
need to scale this one as well.

You don’t have to answer those questions here, but I recommend to answer them 
during a Proof-of-Concept at your premises yourself.
I don’t see a point to create more than one cluster (except for disaster 
recovery and cross data center replication if this is needed). Maybe I am 
overlooking something here why you thought of multiple clusters.

> Am 25.06.2019 um 22:53 schrieb Rahul Goswami <rahul196...@gmail.com>:
> 
> Hello,
> We are running Solr 7.2.1 and planning for a deployment which will grow to
> 4 billion documents over time. We have 16 nodes at disposal.I am thinking
> between 3 configurations:
> 
> 1 cluster - 16 nodes
> vs
> 2 clusters - 8 nodes each
> vs
> 4 clusters -4 nodes each
> 
> Irrespective of the configuration, each node would host 8 shards (eg: a
> cluster with 16 nodes would have 16*8=128 shards; similarly, 32 shards in a
> 4 node cluster). These 16 nodes will be hosted across 4 beefy servers each
> with 128 GB RAM. So we can allocate 32 GB RAM (not heap space) to each
> node. what configuration would be most efficient for our use case
> considering moderate-heavy indexing and search load? Would also like to
> know the tradeoffs involved if any. Thanks in advance!
> 
> Regards,
> Rahul

Reply via email to